art-framework-suite / art

The implementation of the art physics event processing framework
Other
2 stars 7 forks source link

interference of input drop and maxEvents #141

Closed rlcee closed 10 months ago

rlcee commented 11 months ago

Reading an art file is successful if the number of events requested is less than the number of events in the file. If the number of events requested is larger than the number of events in the file, the exe throws on the first event. The error message points to a confusion concerning the input drop commands being used.

To reproduce, on a Mu2e interactive machine,

source /cvmfs/mu2e.opensciencegrid.org/setupmu2e-art.sh
muse setup SimJob MDC2020aa3
mu2e -c /exp/mu2e/app/users/rlc/trigger0.fcl  (-n=3999, succeeds)
or
mu2e -c /exp/mu2e/app/users/rlc/trigger1.fcl  (-n=4001, throws on first event)

We can currently work around this problem so it is not particularly urgent.

knoepfel commented 10 months ago

@rlcee, the error I see with -n=4001 is:

%MSG-s ArtException:  PostEndJob 04-Dec-2023 11:37:56 CST ModuleEndJob
---- EventProcessorFailure BEGIN
  EventProcessor: an exception occurred during current event processing
  ---- FatalRootError BEGIN
    Fatal Root Error: TStreamerInfo::BuildOld
    Cannot convert mu2e::ComboHitCollection::_parent from type: art::ProductID to type: art::ProductPtr<mu2e::ComboHitCollection>, skip element
    ROOT severity: 2000
  ---- FatalRootError END
---- EventProcessorFailure END
%MSG

If I then look at the streamer on disk for mu2e::ComboHitCollection, I get:

root [4] TClass::GetClass("mu2e::ComboHitCollection")->GetStreamerInfos()->ls("");
OBJ: TObjArray  TObjArray       An array of objects : 0

StreamerInfo for class: mu2e::ComboHitCollection, checksum=0x5df5ad65
  vector<mu2e::ComboHit> vector<mu2e::ComboHit> offset=  0 type=300 ,stl=1, ctype=61,                     
  art::ProductID _parent         offset=  0 type=62                     
  bool           _sorted         offset=  0 type=18 record if this collection was sorted

So the _parent data member on disk is an art::ProductID. However, the type of _parent for the in-memory dictionary is art::ProductPtr<mu2e::ComboHitCollection>, and there is no I/O rule to effect the conversion.

What is your intention with this data product? Are you just trying to read it as part of the process and then skip writing it to the output file?

rlcee commented 10 months ago

In the fcl you can find that we input-dropped the inconsistent products. Is there any else we have to do to achieve that goal? In any case, it seems that throw shouldn't depend on the details of -n? Thanks

knoepfel commented 10 months ago

In any case, it seems that throw shouldn't depend on the details of -n?

When -n=3999 is used, fast-cloning is deactivated, which is reported by art (but suppressed by the services.message configuration in the trigger[01].fcl files):

%MSG-w FastCloning:  PostProcessEvent 04-Dec-2023 11:50:27 CST  run: 1202 subRun: 0 event: 1
Fast cloning has been deactivated for the following reasons:
 - There are fewer events to process than are present in the event tree.
%MSG

When fast-cloning is deactivated, ROOT reads each data product, which is often forgiving. However, when -n=4001, fast-cloning is used, and the cloner checks whether the on-disk and in-memory types are consistent for a given data product. Hence the failure.

I am investigating why ROOT is searching for consistency among branches that are not persisted to the output file. In the meantime, you can disable fast-cloning via RootOutput's fastClone: false parameter. Longer term, Mu2e should discuss whether an I/O rule would be beneficial for mu2e::ComboHitCollection. Stay tuned.

knoepfel commented 10 months ago

This error is a result of a lack of I/O rule. I've implemented an I/O rule at https://github.com/Mu2e/Offline/pull/1156, which resolves the error.

$ art -c trigger1.fcl dig.mu2e.NoPrimaryMix1BBUntriggered.MDC2020r_best_v1_0.001202_00000000.art -n 4001
...
TrigReport ---------- Event summary -------------
TrigReport Events total = 4000 passed = 60 failed = 3940
...
TimeReport ---------- Time summary [sec] -------
TimeReport CPU = 140.284813 Real = 140.444220

MemReport  ---------- Memory summary [base-10 MB] ------
MemReport  VmPeak = 5093.55 VmHWM = 1017.38

Art has completed and will exit with status 0.
kutschke commented 10 months ago

Thanks Kyle. We have finished testing and it passes our tests. So far as Mu2e is concerned, you may close the issue at your convenience.

knoepfel commented 10 months ago

Thanks, Rob. Will do.