art-framework-suite / art-root-io

0 stars 2 forks source link

art job influenced by random files #1

Open knoepfel opened 2 years ago

knoepfel commented 2 years ago

This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/19599 (FNAL account required) Originally created by @gaponenko on 2018-04-06 22:13:37


Hi,

It looks like the behavior of an art job can be influenced by "random" rogue files on disk, which are not findable via PATH, LD_LIBRARY_PATH, etc). This happens even when there is no '.' anywhere in the environment. This is a bug because jobs should be defined by their release area and explicit inputs, and not affected by other random files.

Andrei

1) prepare a test file

   ssh mu2egpvm01.fnal.gov
   setup mu2e
   source /cvmfs/mu2e.opensciencegrid.org/Offline/v6_5_2/SLF6/prof/Offline/setup.sh
   mkdir -p /mu2e/app/users/$(whoami)/20180406-dict-breakage
   cd /mu2e/app/users/$(whoami)/20180406-dict-breakage
   cp -pr /mu2e/app/users/gandr/20180406-dict-breakage/inputs .
   mu2e -c inputs/testjob.fcl  # takes a couple of minutes

2) Verify that data products can be listed without complaints:

   mu2e -c Print/fcl/dumpDataProducts.fcl carbon_muons_hits.art  > /dev/null

(no complaints)

3) Drop a bomb:

   
   mv inputs/RecoDataProducts .

4) Re-run the data product listing:

   mu2e -c Print/fcl/dumpDataProducts.fcl carbon_muons_hits.art  > /dev/null
   In file included from libmu2e_RecoDataProducts_dict dictionary payload:27:
./RecoDataProducts/inc/StereoHit.hh:5:20: error: typedef redefinition with different types ('mu2e::ComboHit' vs 'mu2e::StereoHit')
  typedef ComboHit StereoHit;
  ...
  ...
  fatal error: too many errors emitted, stopping now [-ferror-limit=]
knoepfel commented 2 years ago

Comment by @knoepfel on 2018-04-09 16:26:20


Paul Russo wrote:

Oh,

This is ROOT behavior, check the definition of your ROOT_INCLUDE_PATH environment variable.

You are picking up this header file through it:

./RecoDataProducts/inc/StereoHit.hh:5:20: error: typedef redefinition with different types ('mu2e::ComboHit' vs 'mu2e::StereoHit') typedef ComboHit StereoHit;

knoepfel commented 2 years ago

Comment by @knoepfel on 2018-04-09 16:27:00


Rob then wrote:

Thanks Paul,

Can you explain a bit more about what is going on? Does the action happen in the JIT? At dictionary-load-time? At dictionary-use-time?

I take it that the problem is that the jit can’t find the header file because it has been moved?

Or am I way out in left field? Waveland Avenue? Wisconsin?

Rob

knoepfel commented 2 years ago

Comment by @gaponenko on 2018-04-10 19:13:17


I checked on Paul's suggestion. The offending file can not be found via ROOT_INCLUDE_PATH. The value of that variable is shown below. Note that it does not include the /mu2e/app/users/gandr/20180406-dict-breakage directory or any of its parents. I also checked that there is no '/mu2e/app' anywhere in the environment (besides the PWD), and no dot anywhere.

Andrei

20180406-dict-breakage$ echo $ROOT_INCLUDE_PATH | tr ':' '\n'
/cvmfs/mu2e.opensciencegrid.org/Offline/v6_5_2/SLF6/prof/Offline
/cvmfs/mu2e.opensciencegrid.org/artexternals/cry/v1_7i/Linux64bit+2.6-2.12-e15-prof/cry_v1.7/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/mu2e_artdaq_core/v1_02_01e/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/art/v2_10_02/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/fhiclcpp/v4_06_05/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/cetlib/v3_02_00/slf6.x86_64.e15.prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/boost/v1_66_0/Linux64bit+2.6-2.12-e15-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/artdaq_core/v3_01_05/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/artdaq_core/v3_01_05/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/TRACE/v3_13_04/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/art/v2_10_02/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/canvas_root_io/v1_01_02/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/xrootd/v4_8_0a/Linux64bit+2.6-2.12-e15-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/mysql_client/v5_5_58/Linux64bit+2.6-2.12-e15/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/postgresql/v9_6_6a/Linux64bit+2.6-2.12-p2714b/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/pythia/v6_4_28j/Linux64bit+2.6-2.12-gcc640-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/gsl/v2_4/Linux64bit+2.6-2.12-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/fftw/v3_3_6_pl2/Linux64bit+2.6-2.12-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/canvas/v3_02_02/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/range/v3_0_3_0/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/clhep/v2_3_4_5c/Linux64bit+2.6-2.12-e15-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/messagefacility/v2_01_06/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/tbb/v2018_2/Linux64bit+2.6-2.12-e15-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/fhiclcpp/v4_06_05/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/cetlib/v3_02_00/slf6.x86_64.e15.prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/boost/v1_66_0/Linux64bit+2.6-2.12-e15-prof/include
/cvmfs/mu2e.opensciencegrid.org/artexternals/cetlib_except/v1_01_06/include
knoepfel commented 2 years ago

Comment by Paul Russo on 2018-04-16 16:20:58


This is an example of auto-parse behavior. We should re-examine the question of whether or not this should be disabled in ROOT at art startup, and configurable by fhicl.

knoepfel commented 2 years ago

Comment by @kutschke on 2018-04-16 16:35:16


Hi Paul,

Can you explain "auto-parse behaviour".

Thanks,

Rob

knoepfel commented 2 years ago

Comment by @knoepfel on 2018-04-23 16:33:44


Based on discussions with Rob and Andrei, the next step is to experiment with disabling ROOT's auto-parsing behavior. This must be one before opening the first input file: disabling auto-parsing at module or service construction would take effect before the first input file is opened.

We are willing to experiment with this, but your testing of disabling auto-parsing would be more elucidating for your use case.

knoepfel commented 2 years ago

Comment by @rlcee on 2018-04-23 16:47:47


You probably already discussed this with Rob and Andrei, but could reply with a summary of what root is trying to achieve with auto-parsing and what functionality might be lost now and in future versions of root if it is disabled. Thanks

knoepfel commented 2 years ago

Comment by @pcanal on 2020-01-10 19:07:13


Note you can try running with Autoparsing disabled by calling:


gInterpreter->SetClassAutoparsing( false );

However, I don't think we ever test this mode 'carefully/completely' so some corner of ROOT itself may be relying on it.

knoepfel commented 2 years ago

Comment by @pcanal on 2020-01-10 19:09:38


On a side note:


./RecoDataProducts/inc/StereoHit.hh:5:20: error: typedef redefinition with different types ('mu2e::ComboHit' vs 'mu2e::StereoHit')
  typedef ComboHit StereoHit;

Is there really two piece code that use the same typedef name for 2 different type? Or is that a change made from one release to the other?

knoepfel commented 2 years ago

Comment by @gaponenko on 2020-01-11 22:10:06


I created a service that make the

gInterpreter->SetClassAutoparsing( false );

call in its constructor. If I add that service to Print/fcl/dumpDataProducts.fcl then instead of the long list of errors with "fatal error: too many errors emitted, stopping now" in the original ticket I get a shorter crash

$ mu2e -c Print/fcl/dumpDataProductsNoAutoparse.fcl ../carbon_muons_hits.art   > /dev/null
In file included from libmu2e_RecoDataProducts_dict dictionary payload:27:
./RecoDataProducts/inc/StereoHit.hh:5:20: error: typedef redefinition with different types ('mu2e::ComboHit' vs 'mu2e::StereoHit')
  typedef ComboHit StereoHit;
                   ^
Forward declarations from /mu2e/app/users/gandr/autoparsing/Offline.autoparsing/lib/libmu2e_RecoDataProducts_dict.rootmap:1:828: note: previous definition is here
  ...mu2e { class StrawHitFlagDetail; }namespace mu2e { class HelixHit; }namespace mu2e { class ComboHit; }namespace mu2e { class StrawHitPosition; }namespace mu2e { class StereoHit; }namespace mu2e ...
                                                                                                                                                                            ^
Segmentation fault

If the "exploit" file ./RecoDataProducts/inc/StereoHit.hh is removed everything works as before. So the attempt to disable autoparsing did not break things, but also did not prevent ROOT from peeking at files in "." that it should not be looking at.

Are there other calls to try to stop ROOT from messing itself up with unnecessary files?

knoepfel commented 2 years ago

Comment by @kutschke on 2020-09-02 14:11:40


I am interested in reviving this issue. We again had a situation in which ROOT_INCLUDE_PATH was not properly defined. The immediate problem was easy to fix but the long term issue remains.

I would like to know if it is possible to build dictionaries, for use in art jobs, such that ROOT will not need to build dictionaries at run time. Or is that simply not possible? It it is possible, how do we go about it? And how do we test?

So wild guesses: is the issue as simple as defining ( at link time?) that dictionary B depends on dictionary A? We are almost certainly not doing that correctly. Is there a dictionary build-time switch we can set that will fail the build if usage of the dictionary will trigger additional run-time building of dictionaries?

I see that the issue status is "Feedback". I am not sure what you are waiting for from us; please advise.