Open sphuber opened 3 years ago
Hmmm, tricky one. I'm not against it per se, but always a little wary of changing the behaviour of these dunder methods, that don't have an easy deprecation path and/or really concrete expected behaviour. Its just difficult to "guess" if this will break anything unexpectedly (in core or in plugin/user code). Perhaps at least just try changing the behaviour in a fork and running the test suite?
This would align with Python's rule where a deepcopy of an immutable object simply returns that object since there only exists one reference of that object.
Expect nodes are not stricty immutable, given e.g. the extras field
Fully agree that changing this would be tricky with regards to whether it will break behavior. But first I would like to establish if the majority think this should be the expected behavior. If that is the case, then I will investigate what the expected impact would be and if we can put this in 2.0 or we have to have another solution.
The problem is that it is not trivial to workaround in plugins. There were we call deepcopy
ourselves we can replace it with a custom recursive method that deepcopies everything except for nodes. The trickier part is that we cannot change this for code downstream (but maybe that is not a problem).
But first I would like to establish if the majority think this should be the expected behavior.
Fair, and my answer would be I don't know off-hand lol. Perhaps, it would be good to list the use cases you envisage for deepcopy
The current
Data.__deepcopy__
implementation, implemented by this commit and present sincev1.0.0
, returns a clone of the node, without any of the links. This was at the time seen as the expected behavior when a user callscopy.deepcopy
on a node. However, inaiida-common-workflows
we have now found a use case where this is undesirable behavior. The core of it is that a dictionary of inputs, which can contain nodes, is passed to a method as keyword arguments and it performs operations on it that can modify it, before passing it to a process as inputs. The method creates a deepcopy before performing the operations as it should not affect the original inputs, since those may want to be reused for another iteration. However, this leads to any nodes in the input dictionary to be cloned and the clones will be used for the process inputs, but they won't have the original links and so provenance is lost. In this example, there is no reason to clone the node, but it can keep the original one.Is there a good reason to be returning a clone for a deepcopy? Maybe we should distinguish between stored and unstored nodes, as they are mutable and immutable, respectively? This would align with Python's rule where a deepcopy of an immutable object simply returns that object since there only exists one reference of that object. Are there cases where returning the same stored node when it is deepcopied leads to issues? Can we change this behavior for
v2.0
?