aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
436 stars 189 forks source link

Should `Data.__deepcopy__` return a clone of the node #5198

Open sphuber opened 3 years ago

sphuber commented 3 years ago

The current Data.__deepcopy__ implementation, implemented by this commit and present since v1.0.0, returns a clone of the node, without any of the links. This was at the time seen as the expected behavior when a user calls copy.deepcopy on a node. However, in aiida-common-workflows we have now found a use case where this is undesirable behavior. The core of it is that a dictionary of inputs, which can contain nodes, is passed to a method as keyword arguments and it performs operations on it that can modify it, before passing it to a process as inputs. The method creates a deepcopy before performing the operations as it should not affect the original inputs, since those may want to be reused for another iteration. However, this leads to any nodes in the input dictionary to be cloned and the clones will be used for the process inputs, but they won't have the original links and so provenance is lost. In this example, there is no reason to clone the node, but it can keep the original one.

Is there a good reason to be returning a clone for a deepcopy? Maybe we should distinguish between stored and unstored nodes, as they are mutable and immutable, respectively? This would align with Python's rule where a deepcopy of an immutable object simply returns that object since there only exists one reference of that object. Are there cases where returning the same stored node when it is deepcopied leads to issues? Can we change this behavior for v2.0?

chrisjsewell commented 3 years ago

Hmmm, tricky one. I'm not against it per se, but always a little wary of changing the behaviour of these dunder methods, that don't have an easy deprecation path and/or really concrete expected behaviour. Its just difficult to "guess" if this will break anything unexpectedly (in core or in plugin/user code). Perhaps at least just try changing the behaviour in a fork and running the test suite?

This would align with Python's rule where a deepcopy of an immutable object simply returns that object since there only exists one reference of that object.

Expect nodes are not stricty immutable, given e.g. the extras field

sphuber commented 3 years ago

Fully agree that changing this would be tricky with regards to whether it will break behavior. But first I would like to establish if the majority think this should be the expected behavior. If that is the case, then I will investigate what the expected impact would be and if we can put this in 2.0 or we have to have another solution.

The problem is that it is not trivial to workaround in plugins. There were we call deepcopy ourselves we can replace it with a custom recursive method that deepcopies everything except for nodes. The trickier part is that we cannot change this for code downstream (but maybe that is not a problem).

chrisjsewell commented 3 years ago

But first I would like to establish if the majority think this should be the expected behavior.

Fair, and my answer would be I don't know off-hand lol. Perhaps, it would be good to list the use cases you envisage for deepcopy