AIDASoft / podio

PODIO
GNU General Public License v3.0
23 stars 57 forks source link

Add a parameter `cloneRelations` for `clone` to be able not to clone the collections #609

Closed jmcarcell closed 2 months ago

jmcarcell commented 3 months ago

BEGINRELEASENOTES

ENDRELEASENOTES

See this comment and this comment

jmcarcell commented 3 months ago

I would argue this is intended behavior. Cloning specifically only copies the data and leaves the relations alone. Hence, it is technically user responsibility to fix up the relations. Mainly because there is no way to solve this in all generality.

The workflow for keeping the relations as they are in a cloned collection seems too complicated then:

This makes merging collections and wanting to use them as if they were "original" ones quite cumbersome, no?

Allowing the pointers to be different allows an original and cloned object to diverge and still compare to true, even if the data is the same (which is not checked for in the PR but probably should) then the relations could be different.

I am also not entirely sure if the tests would still be working if they are done with the immutable objects. Because in that case clone default initializes the ObjectID, which is {untracked, untracked}.

This I didn't understand but I made some changes in unittest.cpp for untracked objects outside of collections.

tmadlener commented 3 months ago
  • Create a new collection without cloning the elements (we will need to update the relations and we can't remove stuff from there if they are cloned), so with new empty elements where we will have to copy manually all the data and all the other relations except the one we're updating

That is a good point, actually. I didn't think of that. So we would definitely need the possibility for the Mutable types to change their relations, resp. clear them, so that at leas the data is easily clone-able, and then relation fixing is actually possible. Potentially, we can give clone a parameter that just hands back new objects with empty relations as well to make things a bit more usable if necessary.

I think if the purpose is to clone a complete collection, then the functionality should be implemented on the collection level, no? E.g. if you clone a Track collection, and a TrackerHit collection, to which TrackerHit collection should the cloned tracks point? I am not sure there is a generally true answer here. And in that case, I would prefer to have the option to choose what I actually want to do.

MCParticle might be a special case here because usually there is exactly one of those per event and it is self-contained, so cloning that can be handled accordingly. However, that would then belong to the EDM and not in podio, because podio can't know about these special cases.

Allowing the pointers to be different allows an original and cloned object to diverge and still compare to true, even if the data is the same (which is not checked for in the PR but probably should) then the relations could be different.

This will just make things extremely complicated I think, because you have to keep track of quite a few things. Currently the user facing classes are rather simple handles and as long as the m_obj pointer is the same, you can be sure that they actually handle the same data (and relations). If we open that up comparisons become rather expensive because we have to compare everything, effectively. And since the m_obj pointers use user facing handles for the relation handling this will potentially create a rather large graph that you have to check.

I am also not entirely sure if the tests would still be working if they are done with the immutable objects. Because in that case clone default initializes the ObjectID, which is {untracked, untracked}.

This I didn't understand but I made some changes in unittest.cpp for untracked objects outside of collections.

What I meant was something like this:

const auto& coll = frame.get<ExampleMCCollection>("mcparticles"); // <-- Mainly for getting a const& to a collection
auto par = coll[0]; // <-- immutable
auto dau = coll[1]; 

auto cPar = par.clone();
auto cDau = dau.clone();

REQUIRE(cPar.daughters()[0] == cDau); // <-- does this still work?
jmcarcell commented 3 months ago

That is a good point, actually. I didn't think of that. So we would definitely need the possibility for the Mutable types to change their relations, resp. clear them, so that at leas the data is easily clone-able, and then relation fixing is actually possible. Potentially, we can give clone a parameter that just hands back new objects with empty relations as well to make things a bit more usable if necessary.

I think if the purpose is to clone a complete collection, then the functionality should be implemented on the collection level, no? E.g. if you clone a Track collection, and a TrackerHit collection, to which TrackerHit collection should the cloned tracks point? I am not sure there is a generally true answer here. And in that case, I would prefer to have the option to choose what I actually want to do.

I have added a parameter to clone without the relations. Having a look at this I think Mutable types being able to change the relations is not needed; the use case we were discussing here would be after cloning the object since you would have relations to "old" objects. However, with the new parameter when cloning this shouldn't be necessary (or at least I don't know which other use case would require this) and it's cleaner to use clone(true) since you only copy what you need. This should make it a bit less cumbersome to clone a collection since all the data is cloned and then one has to deal only with the relations manually. But since there isn't a clear answer on how to clone the relations, this manual handling probably makes sense, or at least we can start from here.

For the future, I have been thinking about a .cloneCollection() that would only update the relations that are pointing to objects in the current collection (since how to update inter-collection relations doesn't have an answer).

tmadlener commented 2 months ago

I added a small figure that at least schematically shows what is happening plus a few sentences, such that there is at least a tiny bit of documentation together with this feature. I would wait for CI to pass and the merge this.