MongoEngine / mongoengine

A Python Object-Document-Mapper for working with MongoDB
http://mongoengine.org
MIT License
4.23k stars 1.23k forks source link

How to copy all data in document when changing Primary Key ? #2013

Open bekab95 opened 5 years ago

bekab95 commented 5 years ago

When I am saving document with different PK only other document is created with empty fields why ?

bagerard commented 5 years ago

Hi @ufo911, I agree that this is confusing. It is due to the fact that mongoengine tracks the fields that you are updating (and updates specifically what changed later on) and I believe it then ends up firing an upsert at the end, in this case it spots that you modify the _id field and no other field so the upsert is creating an empty document with just the id that you've changed.

This is clearly a bug as it allows you to create a document that violates the Document constraint (for instance the required fields).

Currently (and to my knowledge) there is no easy/robust way to clone an object. The safest is to manually define a clone method in your Document classes. Having a generic clone would be a great addition though, many people are looking for this.

bagerard commented 5 years ago

For future ref, here is a reproducible snippet that shows the issue:

class Person(Document):
    name = StringField()
p = Person(name='John').save()
print(p.id)        # ObjectId('5c86b9324ec5dc1dda1ea566')
list(Person.objects.as_pymongo())    # [{u'_id': ObjectId('5c86b9324ec5dc1dda1ea566'), u'name': u'John'}]
p.id = ObjectId()
print(p.id)    # ObjectId('5c86b9524ec5dc1dda1ea567')
print(p._changed_fields)    # ['_id']
p.save()
list(Person.objects.as_pymongo())    # [{u'_id': ObjectId('5c86b9324ec5dc1dda1ea566'), u'name': u'John'},  {u'_id': ObjectId('5c86b9524ec5dc1dda1ea567')}]
bekab95 commented 5 years ago

May it is possible to determine if the primary key is new before document save ?

amcgregor commented 5 years ago

May it is possible to determine if the primary key is new before document save ?

"How?" is the question that immediately comes to mind. Saved vs. unsaved for my own records is typically the difference between having a value for the _id key or not. Re-assigning a new ID seems… wrong. If instead of re-assigning, you instead del p.id, when trying to save the driver itself (PyMongo) will add an ID, and it'll probably work a whole lot better.

Alternatively, use the force_insert=True argument to MongoEngine's save call. 'Cause again, from just "having an ID or not having an ID", even with dirty field tracking, you can't know if the newly assigned ID is for insertion or update… you're calling save which is meaningless to PyMongo. (There's a reason MongoDB generally avoids "weak statefulness" like that.

bekab95 commented 5 years ago

If the _id or Primariy key (as we call) will be checked before save and if _id is different for the documen, we will call other method which will be used for creating new document (save method for new document) and if _id is same we will use current save method. after that for the method for creating new document, we have to check if _id already exist in database and will show error or make new document. Finally we need new method for comparing current _id and _id before save (if it is new)

amcgregor commented 5 years ago

If the _id or Primariy key (as we call) will be checked before save and if _id is different for the document…

if _id is same we will use current save method

if _id already exist in database and will show error or make new document

Under no circumstances must a write be bound to a preparatory read. The moment you do this, you've introduced a race condition. Worker A wants to "save", worker B wants to "save", both issue the reads, get "non-existent record' answers, both insert. You've duplicated a record, and this is the nicest of the failure situations, which also involve change clobbering possibilities.

Finally we need new method for comparing current _id and _id before save (if it is new)

Any verification must happen against the already loaded "record", whatever that may be, with the possibility for conditional updates to be useful. The logic can be simple:

Yes, this means you can get tricky, assign a different ID, and confuse the heuristic. OTOH, that type of use is typically a bug or otherwise unintentional, and also why hints like force_insert exist. Also note the intended behaviour of differentiating between actually new, and intended-to-be-inserted-new. Non-existent attribute vs. None value.

amcgregor commented 5 years ago

Note that all of this save confusion is an artifact of MongoEngine attempting to mirror database-side state client-side in every application worker process, then monitor that state for changes. Ostensibly helpful, but you still end up needing to wrap your head around both sides of the update, both Python-side and DB-side. I advocate—strongly—for plain PyMongo data access semantics; use the provided and entirely unambiguous insert_one, insert_many, update_one, update_many. Anywhere you have written save is better and more accurately described using the actual operation to be performed. I read save, and I then immediately need to worry, myself, about tracking the hypothetical state of that object through the flow of the code. (From the Zen of Python: explicit is better than implicit.)