Open bekab95 opened 5 years ago
Hi @ufo911, I agree that this is confusing. It is due to the fact that mongoengine tracks the fields that you are updating (and updates specifically what changed later on) and I believe it then ends up firing an upsert at the end, in this case it spots that you modify the _id field and no other field so the upsert is creating an empty document with just the id that you've changed.
This is clearly a bug as it allows you to create a document that violates the Document constraint (for instance the required fields).
Currently (and to my knowledge) there is no easy/robust way to clone an object. The safest is to manually define a clone method in your Document classes. Having a generic clone
would be a great addition though, many people are looking for this.
For future ref, here is a reproducible snippet that shows the issue:
class Person(Document):
name = StringField()
p = Person(name='John').save()
print(p.id) # ObjectId('5c86b9324ec5dc1dda1ea566')
list(Person.objects.as_pymongo()) # [{u'_id': ObjectId('5c86b9324ec5dc1dda1ea566'), u'name': u'John'}]
p.id = ObjectId()
print(p.id) # ObjectId('5c86b9524ec5dc1dda1ea567')
print(p._changed_fields) # ['_id']
p.save()
list(Person.objects.as_pymongo()) # [{u'_id': ObjectId('5c86b9324ec5dc1dda1ea566'), u'name': u'John'}, {u'_id': ObjectId('5c86b9524ec5dc1dda1ea567')}]
May it is possible to determine if the primary key is new before document save ?
May it is possible to determine if the primary key is new before document save ?
"How?" is the question that immediately comes to mind. Saved vs. unsaved for my own records is typically the difference between having a value for the _id
key or not. Re-assigning a new ID seems… wrong. If instead of re-assigning, you instead del p.id
, when trying to save the driver itself (PyMongo) will add an ID, and it'll probably work a whole lot better.
Alternatively, use the force_insert=True
argument to MongoEngine's save
call. 'Cause again, from just "having an ID or not having an ID", even with dirty field tracking, you can't know if the newly assigned ID is for insertion or update… you're calling save
which is meaningless to PyMongo. (There's a reason MongoDB generally avoids "weak statefulness" like that.
If the _id or Primariy key (as we call) will be checked before save and if _id is different for the documen, we will call other method which will be used for creating new document (save method for new document) and if _id is same we will use current save method. after that for the method for creating new document, we have to check if _id already exist in database and will show error or make new document. Finally we need new method for comparing current _id and _id before save (if it is new)
If the _id or Primariy key (as we call) will be checked before save and if _id is different for the document…
if _id is same we will use current save method
if _id already exist in database and will show error or make new document
Under no circumstances must a write be bound to a preparatory read. The moment you do this, you've introduced a race condition. Worker A wants to "save", worker B wants to "save", both issue the reads, get "non-existent record' answers, both insert. You've duplicated a record, and this is the nicest of the failure situations, which also involve change clobbering possibilities.
Finally we need new method for comparing current _id and _id before save (if it is new)
Any verification must happen against the already loaded "record", whatever that may be, with the possibility for conditional updates to be useful. The logic can be simple:
Without any _id
assigned (getattr
raises AttributeError
), the record is clearly brand spankin' new. Upon insert, record the inserted ID given back against the originating record; let the driver itself do the generation.
_id
of None
(using a __delete__
handler to assign None
if it was previously set) the record was previously saved, but we're trying to insert it new: insert, recording the inserted ID against the originating record. As above.
_id
of anything else, use it, it's an existing record to update. Update, don't insert (unless force_insert
).
Yes, this means you can get tricky, assign a different ID, and confuse the heuristic. OTOH, that type of use is typically a bug or otherwise unintentional, and also why hints like force_insert
exist. Also note the intended behaviour of differentiating between actually new, and intended-to-be-inserted-new. Non-existent attribute vs. None value.
Note that all of this save
confusion is an artifact of MongoEngine attempting to mirror database-side state client-side in every application worker process, then monitor that state for changes. Ostensibly helpful, but you still end up needing to wrap your head around both sides of the update, both Python-side and DB-side. I advocate—strongly—for plain PyMongo data access semantics; use the provided and entirely unambiguous insert_one
, insert_many
, update_one
, update_many
. Anywhere you have written save
is better and more accurately described using the actual operation to be performed. I read save
, and I then immediately need to worry, myself, about tracking the hypothetical state of that object through the flow of the code. (From the Zen of Python: explicit is better than implicit.)
When I am saving document with different PK only other document is created with empty fields why ?