art-framework-suite / art

The implementation of the art physics event processing framework
Other
2 stars 7 forks source link

Fatal Root Error: TCling::LoadPCM when using TFileService #115

Closed eflumerf closed 1 year ago

eflumerf commented 2 years ago

Describe the bug When starting art filters in the Mu2e Online Trigger system, we get the following output:

found 534 particles
 --------------- HepPDT Version 3.04.01 --------------- 
found 275 particles
found 3116 particles
%MSG-s ArtException:  DAQ 19-Oct-2021 10:46:54 CDT Booted
cet::exception caught in art
---- OtherArt BEGIN
  ServiceCreation
  ---- FatalRootError BEGIN
    Fatal Root Error: TCling::LoadPCM
    ROOT PCM /cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libROOTVecOps_rdict.pcm file does not exist
    ROOT severity: 3000
  ---- FatalRootError END
  cet::exception caught during construction of service type art::TFileService:
---- OtherArt END
%MSG
Art has completed and will exit with status 1.

To Reproduce Unfortunately, I don't have a minimum broken configuration for this bug, there are a large number of experiment-specific modules loaded. artConfig_1_1636132158196160.fcl.txt This fcl file relies on the mu2etrig@mu2edaq test stand setup. We can give you permissions and instructions to reproduce from there.

eflumerf commented 2 years ago
Thread 1 "art" hit Breakpoint 1, TCling::LoadPCM (this=0xab68e0, pcmFileNameFullPath=...) at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/metacling/src/TCling.cxx:1756
1756      /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/metacling/src/TCling.cxx: No such file or directory.
(gdb) bt
#0  TCling::LoadPCM (this=0xab68e0, pcmFileNameFullPath=...) at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/metacling/src/TCling.cxx:1756
#1  0x00007fff8607b2ea in TCling::RegisterModule (this=0xab68e0, modulename=0x7fffdc30e93a "libROOTVecOps", headers=0x7fffdc52b450 <(anonymous namespace)::TriggerDictionaryInitialization_libROOTVecOps_Impl()::headers>, includePaths=<optimized out>, payloadCode=<optimized out>, 
    fwdDeclsCode=<optimized out>, triggerFunc=0x7fffdc29fcb0 <(anonymous namespace)::TriggerDictionaryInitialization_libROOTVecOps_Impl()>, fwdDeclsArgToSkip=..., classesHeaders=<optimized out>, lateRegistration=true, hasCxxModule=true)
    at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/metacling/src/TCling.cxx:2238
#2  0x00007fffe19e383e in TROOT::InitInterpreter (this=0x7fffe1e9e0e0 <ROOT::Internal::GetROOT1()::alloc>) at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TROOT.cxx:2067
#3  0x00007fffe19e3c57 in ROOT::Internal::GetROOT2 () at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TROOT.cxx:385
#4  0x00007fffe1afcde5 in TInterpreter::Instance () at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/meta/src/TInterpreter.cxx:61
#5  0x00007fffe1a51af6 in TSystem::GetLibraries (this=0xa8e620, regexp=0x7fffe1bcbee5 "", options=0x7fffe1bcbee5 "", isRegexp=true) at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TSystem.cxx:2149
#6  0x00007fffe1a559aa in TSystem::Load (this=0xa8e620, module=0x7fffe1baec7f "libImt", entry=0x7fffe1bcbee5 "", system=false) at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TSystem.cxx:1853
#7  0x00007fffe19e015b in ROOT::Internal::GetSymInLibImt (funcname=0x7fffe1baecb9 "ROOT_TThread_Initialize") at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TROOT.cxx:397
#8  0x00007fffe19e112c in ROOT::EnableThreadSafety () at /scratch/workspace/canvas-products-all/vedge-/SLF7/e20-prof/build/root/v6_22_08d/source/root-6.22.08/core/base/src/TROOT.cxx:497
#9  0x00007fffdec999e5 in art::root::setup () at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art_root_io/v1_08_03-buildFW/src/art_root_io/setup.cc:108
#10 0x00007fffe21662de in art::TFileDirectory::TFileDirectory (this=0x837f50, dir=..., descr=..., file=0x0, path=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art_root_io/v1_08_03-buildFW/src/art_root_io/TFileDirectory.cc:26
#11 0x00007fffe23785d0 in art::TFileService::TFileService (this=0x837f50, config=..., r=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art_root_io/v1_08_03-buildFW/src/art_root_io/TFileService.cc:43
#12 0x00007fffda9ece14 in ?? () from /mu2e/ups/art_root_io/v1_08_03/slf7.x86_64.e20.prof/lib/libart_root_io_TFileService_service.so
#13 0x00007fffda9ed12c in non-virtual thunk to art::detail::ServiceHelper<art::TFileService>::make(fhicl::ParameterSet const&, art::ActivityRegistry&, art::detail::SharedResources&) const ()
   from /mu2e/ups/art_root_io/v1_08_03/slf7.x86_64.e20.prof/lib/libart_root_io_TFileService_service.so
#14 0x00007ffff350a3d6 in art::detail::ServiceCacheEntry::makeService (this=0xa453b8, reg=..., resources=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/Services/Registry/detail/ServiceCacheEntry.cc:75
#15 0x00007ffff350a5d0 in art::detail::ServiceCacheEntry::forceCreation (this=this@entry=0xa453b8, reg=..., resources=...)
    at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/Services/Registry/detail/ServiceCacheEntry.cc:122
#16 0x00007ffff34fc95d in art::ServicesManager::forceCreation (this=0x99b0a0) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/Services/Registry/ServicesManager.cc:221
#17 0x00007ffff4920864 in art::EventProcessor::EventProcessor (this=<optimized out>, pset=..., enabled_modules=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/EventProcessor/EventProcessor.cc:184
#18 0x00007ffff7b7012a in art::run_art_common_ (main_pset=..., enabled_modules=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/Art/run_art.cc:369
#19 0x00007ffff7b71440 in art::run_art (argc=argc@entry=3, argv=argv@entry=0x7fffffff2808, all_desc=..., handlers=...) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/src/art/Framework/Art/run_art.cc:251
#20 0x00007ffff7b2d370 in artapp (argc=3, argv=0x7fffffff2808) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/build/art/Framework/Art/artapp.cc:58
#21 0x000000000040100d in main (argc=<optimized out>, argv=<optimized out>) at /scratch/workspace/critic-slf/BUILDTYPE/prof/QUAL/e20/label1/swarm/label2/SLF7/build/art/v3_09_03-buildFW/build/art/Framework/Art/art.cc:14
(gdb) p pcmFileNameFullPath
$1 = {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
    _M_p = 0x39faf10 "/cvmfs/mu2e.opensciencegrid.org/artexternals/root/v6_22_08d/Linux64bit+3.10-2.17-e20-p392-prof/lib/libROOTVecOps_rdict.pcm"}, _M_string_length = 122, {_M_local_buf = "z\000\000\000\000\000\000\000ecOps\000\000", _M_allocated_capacity = 122}}
(gdb)
mu2etrig@mu2edaq13:/home/mu2etrig/test_stand/ots:ups active
Active ups products:
art               v3_09_03        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
artdaq            v3_11_00        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
artdaq_core       v3_08_00        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/mu2e.opensciencegrid.org/artexternals
artdaq_daqinterface  v3_11_00        -f NULL                                    -z /home/mu2etrig/test_stand/ots/localProducts_mu2edaq__e20_s112_prof
artdaq_database   v1_05_07        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
artdaq_mfextensions  v1_07_00        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
artdaq_utilities  v1_07_00        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
art_root_io       v1_08_03        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
boost             v1_75_0         -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
BTrk              v1_02_34        -f Linux64bit+3.10-2.17 -q e20:p392:prof   -z /cvmfs/mu2e.opensciencegrid.org/artexternals
canvas            v3_12_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
canvas_root_io    v1_09_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
cetlib            v3_13_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
cetlib_except     v1_07_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
cetmodules        v2_26_00        -f NULL                                    -z /mu2e/ups
cetpkgsupport     v1_14_01        -f NULL                                    -z /mu2e/ups
clhep             v2_4_4_1        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
cmake             v3_20_0         -f Linux64bit+3.10-2.17                    -z /mu2e/ups
epics             v3_15_8         -f Linux64bit+3.10-2.17 -q e20             -z /mu2e/ups
fftw              v3_3_9          -f Linux64bit+3.10-2.17                    -z /mu2e/ups
fhiclcpp          v4_15_03        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
gcc               v9_3_0          -f Linux64bit+3.10-2.17                    -z /mu2e/ups
gdb               v10_1           -f Linux64bit+3.10-2.17                    -z /mu2e/ups
gitflow           v1_12_3         -f NULL                                    -z /mu2e/ups
git               v2_31_1         -f Linux64bit+3.10-2.17                    -z /mu2e/ups
gsl               v2_6a           -f Linux64bit+3.10-2.17                    -z /mu2e/ups
hep_concurrency   v1_07_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
heppdt            v03_04_02       -f Linux64bit+3.10-2.17 -q e20:prof        -z /cvmfs/mu2e.opensciencegrid.org/artexternals
KinKal            v00_01_07       -f Linux64bit+3.10-2.17 -q e20:p392:prof   -z /cvmfs/mu2e.opensciencegrid.org/artexternals
libxml2           v2_9_10a        -f Linux64bit+3.10-2.17                    -z /mu2e/ups
messagefacility   v2_08_04        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
mongodb           v4_0_8c         -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
mrb               v5_18_01        -f NULL                                    -z /mu2e/ups
mysql_client      v8_0_23         -f Linux64bit+3.10-2.17 -q e20             -z /mu2e/ups
ninja             v1_10_2         -f Linux64bit+3.10-2.17                    -z /mu2e/ups
nodejs            v10_15_0        -f Linux64bit                              -z /mu2e/ups
numpy             v1_20_1         -f Linux64bit+3.10-2.17 -q e20:p392        -z /mu2e/ups
offline           v10_07_00       -f Linux64bit+3.10-2.17-sl7-9 -q e20:prof:s112:trig -z /mu2e/ups
openblas          v0_3_13         -f Linux64bit+3.10-2.17 -q e20             -z /mu2e/ups
otsdaq_components  v2_06_03        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
otsdaq_epics      v2_06_03        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
otsdaq_mu2e       v1_01_04        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /mu2e/ups
otsdaq_utilities  v2_06_03        -f Linux64bit+3.10-2.17 -q e20:prof:s112   -z /cvmfs/fermilab.opensciencegrid.org/products/artdaq
postgresql        v13_2           -f Linux64bit+3.10-2.17 -q p392            -z /mu2e/ups
pqxx              v6_2_5e         -f Linux64bit+3.10-2.17 -q e20:p392:prof   -z /mu2e/ups
pythia            v6_4_28r        -f Linux64bit+3.10-2.17 -q gcc930:prof     -z /mu2e/ups
python            v3_9_2          -f Linux64bit+3.10-2.17                    -z /mu2e/ups
qt                v5_12_3a        -f Linux64bit+3.10-2.17 -q e20             -z /mu2e/ups
range             v3_0_11_0       -f NULL                                    -z /mu2e/ups
root              v6_22_08d       -f Linux64bit+3.10-2.17 -q e20:p392:prof   -z /mu2e/ups
sqlite            v3_34_01_00     -f Linux64bit+3.10-2.17                    -z /mu2e/ups
swig              v4_0_2          -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
tbb               v2021_1_1       -f Linux64bit+3.10-2.17 -q e20             -z /mu2e/ups
TRACE             v3_17_03        -f NULL                                    -z /mu2e/ups
ups               v6_0_8          -f Linux64bit+3.10-2.17                    -z /mu2e/ups
xdaq              v16_7_0_1       -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
xerces_c          v3_2_3          -f Linux64bit+3.10-2.17 -q e20:prof        -z /cvmfs/mu2e.opensciencegrid.org/artexternals
xmlrpc_c          v1_51_06        -f Linux64bit+3.10-2.17 -q e20:prof        -z /mu2e/ups
xrootd            v5_1_0          -f Linux64bit+3.10-2.17 -q e20:p392:prof   -z /mu2e/ups
mu2etrig@mu2edaq13:/home/mu2etrig/test_stand/ots:
eflumerf commented 2 years ago

Adding this to art_root_io v1_08_03 resolves the crash condition and things still appear to work (still doing more testing now):

diff --git a/art_root_io/detail/RootErrorClassifier.cc b/art_root_io/detail/RootErrorClassifier.cc
index 4394fb5..0ced533 100644
--- a/art_root_io/detail/RootErrorClassifier.cc
+++ b/art_root_io/detail/RootErrorClassifier.cc
@@ -20,6 +20,7 @@ namespace {
                              " is available"s)) {
         return true;
       }
+      if(parser.has_message("rdict")) { return true; }
     }
     return false;
   }

EDIT: The job runs, but then when we go to write data from the DataLogger, art crashes nastily with lots of cling errors, so I think this is not a solution.

eflumerf commented 2 years ago

I've removed our locally-installed ROOT, and that appears to have fixed the issue even without the hack in art_root_io. However, now when we try to write files, we get a large number of errors printed to stdout of this form:

Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
   Missing FileEntry for Offline/MCDataProducts/inc/KalSeedMC.hh
   requested to autoload type mu2e::KalSeed
eflumerf commented 2 years ago

The last issue may be due to oddness in ROOT_INCLUDE_PATH. I think there's no art bug here, after all, so I'm going to close.

eflumerf commented 1 year ago

This has re-reared it's ugly head with root v6_26_06 and art v3_12_00 (art_root_io v1_11_00)

knoepfel commented 1 year ago

@eflumerf, can you provide instructions to reproduce this?

eflumerf commented 1 year ago

I see the problem when trying to load an art file after setting up our UPS version of the Mu2e offline product. This points to an environment issue with that product.

knoepfel commented 1 year ago

Update: I have performed the following setup steps:

$ setup "offline" "d10_19_00" -f "Linux64bit+3.10-2.17-sl7-9" -q "e20:prof:s118:trig"
$ setup art_root_io v1_11_00 -q e20:prof  # <== Uses root v6_26_06
$ art -c /dev/null trigger_driver_rootOutput_1.art -o out.root
INFO: provided configuration file '/dev/null' is empty: 
using minimal defaults and command-line options.
INFO: using default process_name of "DUMMY".
%MSG-i MF_INIT_OK:  Early 11-Jan-2023 11:09:12 CST JobSetup
Messagelogger initialization complete.
%MSG
11-Jan-2023 11:09:12 CST  Initiating request to open input file "trigger_driver_rootOutput_1.art"
11-Jan-2023 11:09:13 CST  Opened input file "trigger_driver_rootOutput_1.art"
Begin processing the 1st record. run: 10904 subRun: 1 event: 3 at 11-Jan-2023 11:09:13 CST
11-Jan-2023 11:09:13 CST  Opened output file with pattern "out.root"
%MSG-w FastCloning:  PostProcessEvent 11-Jan-2023 11:09:13 CST  run: 10904 subRun: 1 event: 3
Fast cloning has been deactivated for the following reasons:
 - Events are not in entry order.
 - The splitting level and/or basket size does not match between input and output file.
%MSG
Begin processing the 2nd record. run: 10904 subRun: 1 event: 4 at 11-Jan-2023 11:09:13 CST

...

MemReport  ---------- Memory summary [base-10 MB] ------
MemReport  VmPeak = 1071.87 VmHWM = 457.322

Art has completed and will exit with status 0.

However, I had to adjust the offline.table file to remove the debug/prof portions of the xerces_c qualifier. The current suspicion is that there is a problem with the TDAQ runtime environment.

knoepfel commented 1 year ago

The problem is understood—see https://github.com/KFTrack/KinKal/issues/166 for details.

knoepfel commented 1 year ago

Resolved per KFTrack/KinKal#166.