MongoEngine / mongoengine

A Python Object-Document-Mapper for working with MongoDB
http://mongoengine.org
MIT License
4.24k stars 1.23k forks source link

Confusing behaviour when ListFields are set to empty list #267

Open gareth-lloyd opened 11 years ago

gareth-lloyd commented 11 years ago
import mongoengine
class TestDoc(mongoengine.Document):
    list_field = mongoengine.ListField()

mongoengine.connect('test')
test_doc = TestDoc.objects.create()

After this operation, in spite of the fact that list_field was not given any value, the BSON document in the collection has a corresponding field initialized to an empty list:

cluster-1:PRIMARY> use test
switched to db test
cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "list_field" : [ ], "_cls" : "TestDoc" }   

I understand why initializing to empty list could be a good idea - it allows subsequent list operations on that field to succeed. However, in the light of this, the following behaviour seems wrong to me:

test_doc.list_field = []
test_doc.save()

This call causes mongoengine to issue an $unset command, removing list_field from the BSON document:

cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "_cls" : "TestDoc" }

So ListFields, unlike other field types, get initialized to empty lists when no value is given, but setting a ListField to empty list causes it to be removed from the underlying BSON document.

tarass commented 11 years ago

I would much rather lists not be auto serialized as empties into the database. Though, it is a bit of a consistency issue with how mongo handles lists, especially as they apply to indexes. Currently mongo treats empty array as A value and adds it to even a sparse index. You don't need an array to exist before you $push to an array, but $pull leaves an empty array in place.

That said I would still rather have a non-existant entry until content is added.

rozza commented 11 years ago

This issue seesaws alot - once 0.8 is out I will review for 0.9

mmellison commented 9 years ago

I would also like to see this changed to be more consistent. There are situations where I am removing entries from a list, but require an empty list to be kept in the database.

As of right now, I have to detect if I am removing the last element from the list and if so, use the update method to set the field as an empty list.

MRigal commented 9 years ago

@seglberg I'll remove the 0.10 milestone on this one, when you have time, you're very welcome to submit something on the topic

MRigal commented 9 years ago

A similar case is also the issue #938

wojcikstefan commented 7 years ago

This is not an issue anymore - test_doc.list_field = [] doesn't unset the list in the database:

In [3]: class Doc(Document):
        nums = ListField(IntField())
   ...:

In [4]: doc = Doc.objects.create()

In [5]: Doc._get_collection().find_one({ '_id': doc.pk })
Out[5]: {u'_id': ObjectId('58448163c6e47b87154ecc6c'), u'nums': []}

In [6]: doc.nums = []

In [7]: doc.save()
Out[7]: <Doc: Doc object>

In [8]: Doc._get_collection().find_one({ '_id': doc.pk })
Out[8]: {u'_id': ObjectId('58448163c6e47b87154ecc6c'), u'nums': []}
andrewsosa commented 7 years ago

Hello! Is this fix included in release v0.11.0? I'm still having this issue on the latest release, also with flask-mongoengine 0.9.2. Thanks!

I'm using a ListField of ReferenceFields.

wojcikstefan commented 7 years ago

@andrewsosa001 could you provide some code that reliably reproduces your issue?

andrewsosa commented 7 years ago

@wojcikstefan Here's a sample script I wrote to demonstrate:

from app import db # Import MongoEngine instance from Flask app instance. 

class Doc(db.Document):
    refs = db.ListField(db.ReferenceField('Doc'))

# Create some docs
d0 = Doc().save()
d1 = Doc().save()
d2 = Doc().save()

# How many Docs have refs = []
print Doc.objects.filter(refs__size=0).count() # prints 3

# Add references for d1 and d2 to d0.
d0.refs.append(d1)
d0.refs.append(d2)
d0.save()

# Check reference counts in DB
print Doc.objects.filter(refs__size=0).count() # prints 2
print Doc.objects.filter(refs__size=2).count() # prints 1

# Remove the references from d0
d0.refs.pop()
d0.refs.pop()
d0.save()

# Check reference count in DB
print Doc.objects.filter(refs__size=0).count()       # prints 2
print Doc.objects.filter(refs__exists=False).count() # prints 1

I've also visually inspected my MongoDB instance using Robomongo to confirm my findings.

Thanks!

wojcikstefan commented 7 years ago

Ah, thanks @andrewsosa001, that makes it clearer. Proof I have shown in https://github.com/MongoEngine/mongoengine/issues/267#issuecomment-264730194 is invalid then - it only worked because I was changing the value from an implicit [] to an explicit [], which doesn't result in a changed field and hence doesn't update it in the MongoDB doc. A quick test reveals that any operation that actually updates the list to be empty unsets it from the MongoDB doc:

In [35]: from mongoengine import *

In [36]: connect('testdb')
Out[36]: MongoClient('localhost', 27017)

In [37]: class Doc(Document):
    ...:     nums = ListField(IntField())
    ...:

In [38]: doc = Doc.objects.create()

In [39]: Doc._get_collection().find_one({ '_id': doc.pk })  # empty list as expected
Out[39]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': []}

In [40]: doc.nums.append(1)

In [41]: doc.save()
Out[41]: <Doc: Doc object>

In [42]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[42]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [43]: doc.nums = []

In [44]: doc.save()
Out[44]: <Doc: Doc object>

In [45]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[45]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}

In [46]: doc.nums.append(1)

In [47]: doc.save()
Out[47]: <Doc: Doc object>

In [48]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[48]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [49]: doc.nums.pop()
Out[49]: 1

In [50]: doc.save()
Out[50]: <Doc: Doc object>

In [51]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[51]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}

In [53]: doc.nums.append(1)

In [54]: doc.save()
Out[54]: <Doc: Doc object>

In [55]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[55]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [56]: doc.nums.remove(1)

In [57]: doc.save()
Out[57]: <Doc: Doc object>

In [58]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[58]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}
wojcikstefan commented 7 years ago

Now, one last question we have to answer is why we expect an empty list to be stored in the database in the cases above.

An argument can be made for auto-unsetting of the field actually being beneficial in terms of storage. However, in such case we should also handle document creation correctly so that it doesn't set nums: [] by default when using Doc.objects.create(). Creation w/o the field is much more important here, because updates unsetting the field don't really make that extra space salvageable on the storage level until database compaction. There's also a certain cleanliness to the approach where your document only contains non-empty fields and it fits nicely with the non-relational paradigm of MongoDB.

On the other hand, always storing an empty list makes for easier querying where you can always be sure that querying by Doc.objects(nums=[]) or by Doc.objects(nums__size=0) will return the expected results. It also is more intuitive when you inspect your raw docs in MongoDB ("why is it empty" is a much easier question to reason about than "why doesn't it exist").

I'm personally leaning towards the latter argument of always storing the empty lists. What do you think @lafrech @thedrow @andrewsosa001 @gukoff ?

andrewsosa commented 7 years ago

@wojcikstefan I think it actually makes more sense to have empty lists automatically removed (what is currently happening). You made good points in your argument for that behavior.

It would be easy to adjust for in terms of querying, instead of the Docs.objects(nums=0), you would instead be looking for Docs.objects(nums__exists=False). On the flip side, it would be more consistent, e.g. Doc.objects(nums__size=n), where n could be 0, 1, 2, etc... .

Ultimately, I think as long as the approach is consistent document creation behavior (i.e., right now new documents with empty list fields initialize to []), either approach would be viable.

gukoff commented 7 years ago

An argument can be made for auto-unsetting of the field actually being beneficial in terms of storage.

This is true. However, there's always a tradeoff between simplicity and performance, and for sure one should keep away of the premature optimization. This heuristic seems like one, because we improve the memory issue without being explicitly asked to. And this improvement implies the simplicity/maintainability/transparency degradation, because you have to maintain the introduced complexity.

There's also a certain cleanliness to the approach where your document only contains non-empty fields and it fits nicely with the non-relational paradigm of MongoDB.

Is it? Implicitly having the same field in 2 possible states with different behaviour {non-existent/non-empty array} reminds me only of the nullable references aka the Billion Dollar Mistake.

I think that transparency is important. Mongoengine isn't a service, which uses the database as its private backend. It doesn't even hide the raw MongoDB queries to the full. It is a tool, which helps a user work with the database. Not the database helps a user work with mongoengine(which might be a rather convenient scenario). Therefore, all the tricks with the storage should be explicit.

A hyperbolized example of a similar heuristic can be a compression of the large strings. It would make storage more efficient, but the string indexes useless.

wojcikstefan commented 7 years ago

This is wonderfully put, thank you for taking the time to write it @gukoff ! I agree with all of your points.

Clearly current implementation is already causing confusion, as proven by this issue. We could try to tweak it and fix some of its problems, but it will always have certain edge cases. For example, Doc.objects.update(pull__nums=last_remaining_value_in_a_given_doc) will leave you with an empty list and attempting to unset it in a "smart" implicit way would be way too hacky and auto-magical. We're just asking for trouble/confusion/code complexity.

In the future, we should go with the more explicit (and thus more intuitive) solution. The only "magic" I'm a fan of is to default to an empty list in the MongoEngine object if the document we loaded from the database doesn't contain a specific list field or if that field is set to null. This way, we always ensure that it's possible to iterate over that field w/o doing any extra checks. This is quite intuitive and consistent with past behavior.

wojcikstefan commented 7 years ago

Small note wrt consistency among different field types: When you assign a None value to a particular field, we unset it from the MongoDB doc (see the code below). This behavior seems pretty natural to me and I'd probably want to preserve it. However, stuff like doc.list_field.pop()/remove() is already one step further in terms of implicitness. The question that remains is: Does it really make sense to set empty complex values (e.g. lists or dicts) in the doc upon creation? I'm leaning towards a yes because, unless a field is explicitly unset, it ensures we'll always be able to query for docs with empty lists via db.doc.find({nums: [] }).

HakuG commented 6 years ago

Did this ever get resolved? I feel like there should at least be a way to make sure the field is not unset, preferably when defining the document's fields.

anlauren commented 5 years ago

Same opinion here, was that ever resolved? As it turns out, unsetting the fields seems to not cascade on the CachedReferenceFields as well

alaminKutumbita commented 5 years ago

I found a nice little work around this issue. You can use this test_doc.save() test_doc.update(add_to_set__list_field = []) it will update the reference with the empty list field

HakuG commented 5 years ago

I wonder if there's a way to make it so that it can be done on the Document page... like "if (len(field) == 0): add_to_Set__list_field." I'm not too versed in mongoengine now to do it, but would someone know? Maybe it's done on the pre_save hook?

On Sat, Mar 2, 2019 at 7:00 AM alaminKutumbita notifications@github.com wrote:

I found a nice little work around this issue. You can use this test_doc.save() test_doc.update(add_to_set__list_field = []) it will update the reference with the empty list fiel

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MongoEngine/mongoengine/issues/267#issuecomment-468914046, or mute the thread https://github.com/notifications/unsubscribe-auth/AFQjWYjx0ASpbjGlQdHL6le4uMNhztdEks5vSmfLgaJpZM4AjGfp .

-- Harnek Gulati Occupation: Founder of MakerFleet Education: Harvard University Computer Science Graduate w: ilikebuildingthings.com and makerfleet.com e: gulati.harnek@gmail.com

anlauren commented 5 years ago

The problem @alaminKutumbita is that no post_save hooks is triggered on update, moreover it doesnt update the cached reference fields :/

neilgoldman commented 5 years ago

Having a similar issue when setting a DictionaryField to an empty object/dict, the field becomes unset. But the field has required=True so now the document fails validation on loading. I'm able to get around it by using pymongo functions, but am worried that later something will save the document and cause mongoengine to unset it again.

zellsun commented 5 years ago

You'll get different results when you query a collection or call .first() at this query.

For the whole collection, those unset fields are missing. But magically, .first() will return you default values for those unset fields.

I rely on this behavior as my workaround, that is I loop the whole results, and for every single iteration the default values are there. (However, the unset fields have been unset and no data in db actually.)

vivektweeny commented 5 years ago

Hi, In the model definition add default value like this "users = fields.ListField(default=['text/html'])", it will set empty list [ ] at time of updating the document.

vivektweeny commented 5 years ago
import mongoengine
class TestDoc(mongoengine.Document):
    list_field = mongoengine.ListField()

mongoengine.connect('test')
test_doc = TestDoc.objects.create()

After this operation, in spite of the fact that list_field was not given any value, the BSON document in the collection has a corresponding field initialized to an empty list:

cluster-1:PRIMARY> use test
switched to db test
cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "list_field" : [ ], "_cls" : "TestDoc" }   

I understand why initializing to empty list could be a good idea - it allows subsequent list operations on that field to succeed. However, in the light of this, the following behaviour seems wrong to me:

test_doc.list_field = []
test_doc.save()

This call causes mongoengine to issue an $unset command, removing list_field from the BSON document:

cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "_cls" : "TestDoc" }

So ListFields, unlike other field types, get initialized to empty lists when no value is given, but setting a ListField to empty list causes it to be removed from the underlying BSON document.

Hi, In the model definition add default value like this "users = fields.ListField(default=['text/html'])", it will set empty list [ ] at time of updating the document.

bagerard commented 4 years ago

FYI I started working on fixing this so that MongoEngine won't unset list/dict fields as soon as the list/dict is empty

hieubz commented 3 years ago

I got the same issue at v0.23.1.

zoebriois commented 1 year ago

Hello, any updates ?