coleifer / peewee

a small, expressive orm -- supports postgresql, mysql, sqlite and cockroachdb
http://docs.peewee-orm.com/
MIT License
11.17k stars 1.37k forks source link

Implement __deepcopy__ and __copy__ methods in Node #2884

Closed travnick closed 6 months ago

travnick commented 6 months ago

There is clone() method in Node. But as far as I can see on the internet, it's not clear if this should perform shallow or deep copy - I was convinced it's deep copy and got the database broken.

I would suggest implementing it in very clear way with use of a python __deepcopy__ and __copy__, so it's well-defined what is the result of copy() or deepcopy().

coleifer commented 6 months ago

Can you provide details on what you did and how it failed?

coleifer commented 6 months ago

Not a bug at any rate, as the clone API is intended to provide a safe copy for certain operations. The purpose is neither to copy nor to deepcopy, but to provide a new reference to an equivalent Node so that additional mutations can take place without disturbing the original. This is important because people frequently chain together various operations to produce new Node objects.

travnick commented 6 months ago

I did the clone(), and then reset id field, changed other fields. after debugging, I found out that modifications on cloned object did modify the original one values - so that's how my database got broken (development one fortunately, but there is a risk)

coleifer commented 6 months ago

I still don't have enough information on how this occurred or what you tried to implement to be able to provide much advice.

travnick commented 6 months ago

Simply database went into invalid state (in the software meaning, database itself is working, but has broken data for my app) since was modifying wrong records (because of unwanted changes to original object)

For now, I've used cloned_entry = deepcopy(original_entry) and changing the cloned_entry does not modify the original one. Code looks like working properly, but not well tested yet. Since peewee does magic with classes and objects, I wonder if using deepcopy does not copy too much.

coleifer commented 6 months ago

If you can share the problematic code I can offer help, but I'm still unclear what you were intending to implement and, furthermore, how you intended to use copy/clone.

travnick commented 6 months ago

My case is complicated, because not having transactions in SqliteQueuedDatabase (I guess it will end up with implementing the transactions, since it makes error recovery way too hard without).

Having Two tables:

There is also a unique key, that combines metadata.status (few of them), and file unique marker. For some reason upsert did not work claiming that there is no unique key o_O, but at the same time, it's not possible to insert new metadata record, because of having unique key :D

Now I need to change a batch of files, together with creating new metadata entries for them (so if there is an error in the middle, then we have broken data).

Since these are separate tables, I do not see a way to use batch_update/batch_insert with peewee.

As a very basic and incomplete error recovery, I'm just iterating over a collection of files, and:

  1. making a clone (deep dopy) of metadata,
  2. preparing a new metadata record,
  3. inserting new file record,
  4. changing status of original metadata entry (requiring here the clone),
  5. inserting new metadata.

In case of error, trying to recover to previous state (I have original entries untouched, because of clone - deep copy).


So in case having clone() making a shallow copy not a deep copy (unexpected/unclear interface) I ended up with totally mixed data - that is why I suggest removing unclear clone() and provide clear interface implementation of __deepcopy__ and __copy__. In that way it will be clear what to expect as the output.

coleifer commented 6 months ago

It might be worth trying to use just regular SqliteDatabase and relying on sqlite's built in busy timeout to handle cases when another thread has the database locked for writes, that way you can have transactional behavior. Upsert should work, but there is also the older REPLACE which may also work if your upsert logic is very simple.

I still, unfortunately, am not clear on what exactly went wrong but nonetheless hopefully the above may help.

travnick commented 6 months ago

the very basic example that produced unexpected (for me) behaviour is:

# eg. original_entry.is == 1
cloned_entry = original_entry.clone()

cloned_entry.id = None

# and now cloned_entry.id == None
# this one is ok, but on the other hand very suprised because of
# original_entry.id == None
# the same thing with other properties (db columns) of the object

# working way
cloned_entry = deepcopy(original_entry)

cloned_entry.id = None

# and now cloned_entry.id == None
# this one is ok, and original one remains valid
# original_entry.id == 1
# the same thing with other properties (db columns) of the object

Ok, will try simply using SqliteDatabase

coleifer commented 6 months ago

Yes, you don't want to use clone() on a model instance, and the Model.clone() method is not documented as being the way to accomplish this. If you wish to "clear" a model and re-save it, just clear it's id field - this will cause a new row to be inserted when you call save.