art-framework-suite / art-root-io

0 stars 2 forks source link

Knock on effects from removal of public c'tors of TFileDirectory (Mu2e) #9

Closed kutschke closed 2 years ago

kutschke commented 2 years ago

Comment added Dec 12, 11:47 AM: I have been working on this myself with some input from Ray. You might want to jump straight to the 5th messsage.

Describe the bug In art suite 3.10 the public c'tors of TFileDirectory have been removed. This has knock on effects in Mu2e code. I have a fix that works but it's very intrusive. I would like to understand if a less intrusive solution is possible.

To Reproduce In a clean directory that sees the mu2e cvmfs area:

source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh git clone https://github.com/Mu2e/Production git clone https://github.com/kutschke/Offline git checkout -b artv310_v0 origin/artv310_v0 cd .. mkdir muse cp /mu2e/app/users/kutschke/Development/artv3_10/muse/u018 muse muse setup -1 -q u018 time muse build -j 24 build/sl7-prof-e20-u018/Offline/tmp/Mu2eUtilities/src/EventWeightHelper.os

This fails with the error:

Offline/Mu2eUtilities/src/EventWeightHelper.cc:28:88: error: use of deleted function 'art::TFileDirectory::TFileDirectory(const art::TFileDirectory&)'

To build the version that works

cd Offline git checkout -b artv310_v1 origin/artv310_v1 cd .. muse build -j 24 build/sl7-prof-e20-u018/Offline/tmp/Mu2eUtilities/src/EventWeightHelper.os

To see the differences between the original code and the fix:

cd Offline git diff artv310_v0 Mu2eUtilities/src/EventWeightHelper.cc git diff artv310_v0 Mu2eUtilities/inc/EventWeightHelper.hh

The issue is in the .cc file. I would like to keep the structure of the original code in which we define an object that is/refers to/points to the correct directory; then have one block of code to make the histograms. I have not been able find a type that can refer to either topdir or to the return value of mkdir. It's not a big deal for a simple case like this but, if we need to create many histograms, it would be awkward. I suspect there will be many cases like this in our code base; there are at least 2.

If this is a high-priority issue Yes. This is high priority but I am not able to choose labels.

kutschke commented 2 years ago

There are at least 14 of these issues in Mu2e Offline. Here is the worst case:

https://github.com/kutschke/Offline/blob/c5c46029a44636c26f3feb162f9e2672a4d5c82b/Mu2eUtilities/src/TrackCuts.cc#L17

The issue is found inside a helper function that is used in the intiailizer list.

Clearly I need to go back to rvalue reference school.

kutschke commented 2 years ago

I don't think that I have fully wrapped my head around rvalues and rvalue references as used in this case. I would appreciate it if you can help me sort it out.

First, in TFileDirectory, only one c'tor is specified. It's a 4 argument c'tor and it's protected. I don't see any declaration of default c'tor, the copy c'tor or copy assignment, nor the move c'tor and move assignment. Does making the one declared c'tor protected, also make all rule of 5 functions protected?

What about the default c'tor? I have a vague recollection that it's implicitly deleted if you provide any non-default c'tor. Do I remember correctly?

The function TFileDirectory::mkdir returns a TFileDirectory object by value. The caller cannot receive the return by value because there is no public copy c'tor or public copy assignment. So the returned object must be an unnamed temporary. If the caller were to receive the return value as an lvalue reference, there would be a memory leak because an lvalue reference never deletes its referent and there is no other mechanism to delete the temporary. Therefore the language forbids it. The caller may receive the return value as an rvalue reference because an rvalue reference will delete it's referent when it (the rvalue reference) goes out of scope. So that works. Is that story right so far?

Now for more confusion. As best I can there is no move c'tor or move assignment - so why does return by rvalue reference work? Does the compiler treat the publicness of these functions differently than it does the copy c'tor and copy assignment? Or does it bypass those functions entirely because it can construct the temporary in place by RVO? Or ... ?

And one side issue. I notice that TFileDirectory is a base class that is inherited by TFileService. However TFileDirectory does not have a declared virtual d'tor. Last I heard ( a long, long time ago) a base class was required to provide a virtual d'tor or bad things could happen. Have the rules changed?

kutschke commented 2 years ago

I have done some more homework and received some input from Ray. So I now understand the questions raised in this comment. I am leaving it here for reference.

------------------ original comment below ---------------------------

I read some more about rvalue references and learned that you may bind an rvalue to a const lvalue reference. I have made two small examples to play with this:

https://github.com/kutschke/TFD

See the README. So my question is: is my example TFD2_module.cc legit c++ ?

kutschke commented 2 years ago

With some input from Ray I have been able to focus on the core problem. This expression:

std::string name = ... ; artFileDirectory& topdir = ...; art::TFileDirectory tfdir = name.empty() ? topdir : topdir.mkdir(name.c_str());

no longer compiles. I understand that a copy is needed when name is empty - and there is no copy c'tor. So it fails. Hopefully the intent of the code is clear. After doing my homework I expected the following to compile:

art::TFileDirectory const& tfdir = name.empty() ? topdir : topdir.mkdir(name.c_str());

I based this expectation on the fact that the following two lines do compile: art::TFileDirectory const& tfdir = topdir; art::TFileDirectory const& tfdir = topdir.mkdir(name.c_str());

Note that these are the two components in the ternary expression. As it happens, the ternary fails with the error that there is no public copy c'tor or copy assignment, the same as before.

I think I understand and would like to know if I am right. I think that each of the 3 expressions in the ternary operator are their own scope - is that true? If so the temporary returned by topdir.mkdir(name.c_str()) will go out of scope and be deleted immediately. Therefore there is nothing to bind to lhs. Therefore compiler decides to make a second unnamed temporary that is a copy of the first unnamed temporary but having a lifetime matched to the scope of the lhs. If the compiler were smarter it could elide the copy and adjust lifetimes. But it's not that smart and there is no way to do the copy. Is that analysis correct?

So the Catch 22 is that topdir and the temporary returned by topdir.mkdir(name.c_str()) have different lifetimes. So there is no way to bind them to a single lhs object.

As best I can tell, we are left with 3 options.

  1. Restore the copy c'tor to TFileDirectory. I have no idea whether or not this is a reasonable request.
  2. Restructure 14 of our classes as shown below
  3. Do evil things with gDirectory.

Code fragment for option 2: If ( name.empty() ) { // do the work (topdir); } else { art::TFileDirectory tfdir = topdir.mkdir(name.c_str()); // do the work (tfdir) } If the “do the work” is more than 1 line we can encapsulate it in a function. Option 3 is straightforward but tedious and error prone.

knoepfel commented 2 years ago

@kutschke, thank you for the bug report. I understand the problem. I will investigate solutions today.

knoepfel commented 2 years ago

@kutschke, there is another option you might consider:

- art::TFileDirectory tfdir = name.empty() ? topdir : topdir.mkdir(name.c_str());
+ art::TFileDirectory tfdir = topdir.mkdir(name);

If name is empty, tfdir will refer to the same directory as topdir--that's because the argument to mkdir is interpreted relative to topdir. Can you try this and see if it works? You'll still need to pass in the topdir argument by reference instead of by value, but perhaps that's sufficient.


A couple points:

  1. The ternary operator performs a serious of conversions to ensure consistent types. Usually, this is not a problem, but sometimes it can result in difficult to understand compilation behavior. You're running into that here.
  2. The copy constructors should arguably have been removed whenever art 3 was released--TFileDirectory contains a mutex data member that is not copyable and the user-defined copy constructor was just ignoring it.
kutschke commented 2 years ago

@kutschke, there is another option you might consider:

- art::TFileDirectory tfdir = name.empty() ? topdir : topdir.mkdir(name.c_str());
+ art::TFileDirectory tfdir = topdir.mkdir(name);

If name is empty, tfdir will refer to the same directory as topdir--that's because the argument to mkdir is interpreted relative to topdir. Can you try this and see if it works? You'll still need to pass in the topdir argument by reference instead of by value, but perhaps that's sufficient.

Thanks Kyle. I will try it and let you know the result.

kutschke commented 2 years ago

The code compiles but I have not yet tested that it works. I think I know how to make a standalone test that does not require making all 14 edits.

I am confused about one thing. Suppose that the string is empty. It will create a new TFileDirectory object pointing to the same underlying TDirectory. Presumably I can access that one TDirectory using either object. Does the mutex protect against that? For our code, as it exists today, there can never be a race condition since the relevant parts of our code are single threaded. I am asking in case this needs to be on our radar in the long term future.

kutschke commented 2 years ago

The code also passes my standalone tests. I will pass this on to Mackenzie so that she can change it where needed, complete the port to art v3.10 and test.

Please close the ticket - we will open a new one if there are follow-on issues.

knoepfel commented 2 years ago

Glad to hear, Rob. Thanks for the update. Incidentally, I will open a new issue report:

Unfortunately, each TFileDirectory object has its own mutex--this is a hole in the current system. Anyone who interacts only with TFileService uses the same mutex (as TFileService is a derived class of TFileDirectory). However, once a separate TFileDirectory object is created, a separate mutex is created also. This must be fixed.