MongoEngine / mongoengine

A Python Object-Document-Mapper for working with MongoDB
http://mongoengine.org
MIT License
4.24k stars 1.23k forks source link

Never writing default values in DB if key not present #1756

Open sknat opened 6 years ago

sknat commented 6 years ago

Default values are never written to DB if containing prop was removed before:

import mongoengine

class Embdoc(mongoengine.EmbeddedDocument):
    prop = mongoengine.IntField(default=0)

class Doc(mongoengine.Document):
    emb = mongoengine.EmbeddedDocumentField(Embdoc, default=Embdoc)

    meta = {
        "db_alias": "db_name",
        "collection": "collection"
    }

mongoengine.connection.register_connection('db_name',name='db_name',host='localhost')
collection = mongoengine.connection.get_db('db_name').collection

collection.remove()
o = Doc()
o.save()
print(o.emb.prop)                             # 0
print(collection.find().limit(1)[0])          # {u'_id': ObjectId('5abaaaaaaaaaaaaaaaaaaaaa'), u'emb': {u'prop': 0}}
collection.update({}, {"$unset": {"emb":1}})
o.reload()
o.save()
print(o.emb.prop)                             # 0
print(collection.find().limit(1)[0])          # {u'_id': ObjectId('5abaaaaaaaaaaaaaaaaaaaaa')}

# Even so
o.emb.prop = 0
o.save()
print(o.emb.prop)                             # 0
print(collection.find().limit(1)[0])          # {u'_id': ObjectId('5abaaaaaaaaaaaaaaaaaaaaa')}

Funny isn't it ?

python -c "import mongoengine; print(mongoengine.__version__)" -> 0.15.0
ubuntu 17.10 artful
kernel 4.13.0
erdenezul commented 6 years ago

$unset operator will deletes a particular field as document says. Moreover, if you set default value like 0, mongoengine assumes nothing changed in database and save method does really nothing. I think it is correct behavior. How mongoengine can know you unsetted(deleted) particular field?

JohnAD commented 6 years ago

A philosophical way of stating it: if you bypass MongoEngine, you can indeed fool MongoEngine.

Two solutions:

  1. Don't circumvent MongoEngine. Rather than:
    collection.update({}, {"$unset": {"emb":1}})

    use

    Doc.objects.update(unset__emb=1)
  2. Or re-invoke MongoEngine to make it fresh. Rather than:
    collection.update({}, {"$unset": {"emb":1}})
    o.reload()

    use

    collection.update({}, {"$unset": {"emb":1}})
    o = Doc.objects(id=o.id).first()
sknat commented 6 years ago

@JohnAD @erdenezul Thanks for your quick replies.

class Doc(mongoengine.Document): emb = mongoengine.EmbeddedDocumentField(Embdoc, default=Embdoc)

meta = {
    "db_alias": "db_name",
    "collection": "collection"
}

mongoengine.connection.register_connection('db_name',name='db_name',host='localhost') collection = mongoengine.connection.get_db('db_name').collection

collection.remove() o = Doc() o.save() print(o.emb.prop) # 1521621076.49 print(collection.find().limit(1)[0]) # {u'_id': ObjectId('5abaaaaaaaaaaaaaaaaaaaaa'), u'emb': {u'prop': 1521621076}} collection.update({}, {"$unset": {"emb":1}}) time.sleep(1) o = Doc.objects(id=o.id).first() print(o.emb.prop) # 1521621077.5 (not the same as before) print(collection.find().limit(1)[0]) # {u'_id': ObjectId('5abaaaaaaaaaaaaaaaaaaaaa')}

JohnAD commented 6 years ago

Ah, I discuss this tricky bit in one of my MongoEngine Field videos: MongoEngine Fields Ep. 3b at 10:40s

Passing a function, as opposed to a value, is not well documented or explained, IMO. But the rule is this:

So, you will want to change your Embdoc class definition to:

class Embdoc(mongoengine.EmbeddedDocument):
    prop = mongoengine.IntField(default=time.time())

That will cause time.time() to run during the definition of the class and return a value. Now, the prop's default is locked to that static value.

Ironically, many folks are tripped up in the opposite direction. Something like:

"Every time my python web server creates a new Document it has the exact same time saved; the time when the server started, not when the document was created."

And that is because they mistakenly did this:

class MyDoc(Document):
    time_created = DateTimeField(default = datetime.datetime.now())

rather than this:

class MyDoc(Document):
    time_created = DateTimeField(default = datetime.datetime.now)
JohnAD commented 6 years ago

@sknat , I just re-read this thread; and I might not have addressed your specific desire: are you wanting MongoEngine to automatically detect that something, or someone else, has modified the document and override that change and restore the original data? A few questions:

  1. Where would you like the knowledge of the original data stored? In the RAM of the current python session? In another collection? In the original document; hidden away somewhere?

  2. If in the RAM of the current session, would having a method command that saves the whole document, regardless of what has really been changed, suffice? Something like adding a "o.brute_save()". Such a method would successfully prevent another actor from working on the same document at the same time as the other changes or additions are wiped out.

  3. What is the application for this?

bagerard commented 6 years ago

I think what @sknat is asking for is to have mongoengine detecting that an object that got reloaded from the db uses a default value that is not in the pymongo document, and track that in _changed_fields so that it would get pushed to the db in case of .save().

This is an issue that we also have after running model migrations but I'm not sure it would be a good idea to go this way. I believe the right way is to migrate the fields to the default values manually.

gordol commented 6 years ago

I think what @sknat is asking for is to have mongoengine detecting that an object that got reloaded from the db uses a default value that is not in the pymongo document, and track that in _changed_fields so that it would get pushed to the db in case of .save().

I think we should have a flag to set the expected behavior any time a document is loaded into the models with a mismatched pymongo document. Even if default behavior is a warning, that would be better than nothing. But, what I'd like to see is a flag that gives the ability to ignore the variance, or force the value into the database when saving.

I believe the right way is to migrate the fields to the default values manually.

The problem is basically that you can write migration scripts, sure, but if you do it in mongoengine with property setters on the object, you have to first set all the fields to a non-default value, save(), then set them back to the model default and save again.

The only alternative is to write an update() query instead of using getters/setters on the object, which seems to circumvent the dirty checking in mongoengine. However, in another issue related to this one, you warned against using .update(). So, aside from not using mongoengine to do the migration, or saving twice w/ an intermediate value, or using update(), what do you suggest?

Basically, a flag to control the dirty checking behavior during data divergence might be necessary to facilitate keeping all users happy, where some expect different behavior than others.

sknat commented 5 years ago

Sorry all, I left this thread dead for a while. @gordol I do agree, I was thinking about doing something like that,

set the expected behavior any time a document is loaded into the models

i.e. when a key is missing in the DB json document, get the default value from the function/value given in the model.

To add a bit more context on what is happening and what can be very troubling, take the following example :

import mongoengine
mongoengine.connection.register_connection('db_name',name='db_name',host='localhost')

class Doc(mongoengine.Document):
    lst = mongoengine.ListField(mongoengine.IntField(), default=list)

    meta = {
        "db_alias": "db_name",
        "collection": "collection"
    }

collection = mongoengine.connection.get_db('db_name').collection

doc = Doc()
doc.save()

print '#1'
print doc.lst
print(collection.find({'_id': doc.id})[0])

doc.lst.append(1)
doc.save()

print '#2'
print doc.lst
print(collection.find({'_id': doc.id})[0])

doc.lst = []
doc.save()

print '#3'
print doc.lst
print(collection.find({'_id': doc.id})[0])

Which gives

#1
[]
{u'lst': [], u'_id': ObjectId('5be9bf06330c0354fa0fa581')}
#2
[1]
{u'lst': [1], u'_id': ObjectId('5be9bf06330c0354fa0fa581')}
#3
[]
{u'_id': ObjectId('5be9bf06330c0354fa0fa581')}

So setting back the default value unsets the attribute ? And we are currently saying in this thread that an attribute that has been unset will never be set again to the default value ?

That also gets very tricky when you get into queries like Doc.objects(lst=[]) you'll basically end up having different results whether you're at step (1) or (3) in the exemple. So as of now all the queries for 'give me all documents with the default value of that prop' should be written Doc.objects(Q(lst=[]) | Q(lst=None)).

bekab95 commented 5 years ago

is not this fixed ?