cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.07k stars 4.28k forks source link

ProductNotFound issue in T0 HLTMonitor processing jobs starting Run 378985 #44643

Open saumyaphor4252 opened 5 months ago

saumyaphor4252 commented 5 months ago

Reporting the T0 processing error in jobs for HLTMonitor stream for Run 378985 detailed in https://cms-talk.web.cern.ch/t/express-paused-jobs-run2024b-productnotfound-error/38544

CRITICAL:root:Error running cmsRun
{‘arguments’: [‘/bin/bash’, ‘/srv/job/WMTaskSpace/cmsRun1/cmsRun1-main.sh’, ‘’, ‘el8_amd64_gcc12’, ‘scramv1’, ‘CMSSW’, ‘CMSSW_14_0_4’, ‘FrameworkJobReport.xml’, ‘cmsRun’, ‘PSet.py’, ’
']}
CMSSW Return code: 8006

CRITICAL:root:Error message: An exception of category ‘ProductNotFound’ occurred while
[0] Processing Event run: 378985 lumi: 224 event: 220047314 stream: 6
[1] Running path ‘dqmoffline_step’
[2] Calling method for module TrackRefitter/‘hltrefittedForPixelDQM’
Exception Message:
RefCore: A request to resolve a reference to a product of type ‘std::vectorreco::TrackExtra’ with ProductID ‘1:3256’
can not be satisfied because the product cannot be found.
Probably the branch containing the product is not stored in the input file.
Additional Info:
[a] If you wish to continue processing events after a ProductNotFound exception,
add “TryToContinue = cms.untracked.vstring(‘ProductNotFound’)” to the “options” PSet in the configuration.

The tarball for the PSet is available at

/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024B/ProductNotFound/vocms014.cern.ch-27238-3-log.tar.gz

Somewhat similar symptoms also seem to be for Run378993 with error

type: "Fatal Exception"
details: "An exception of category 'ProductNotFound' occurred while 
   [0] Processing Event run: 378993 lumi: 188 event: 223221502 stream: 1  
   [1] Running path 'dqmoffline_step' 
   [2] Calling method for module TrackingMonitor/'gsfTracksMonitoringHLT' 
Exception Message: 
RefCore: A request to resolve a reference to a product of type 'std::vector<reco::TrackExtra>' with ProductID '1:3118' can not be satisfied because the product cannot be found. 
Probably the branch containing the product is not stored in the input file. 
  Additional Info: 
        [a] If you wish to continue processing events after a ProductNotFound exception, add "TryToContinue = cms.untracked.vstring('ProductNotFound')" to the "options" PSet in the configuration."
exitCode: 8006

FYI @cms-sw/dqm-l2 May also be relevant to the online hlt DQM client crashes starting Run 378981 reported at the DRM today: https://cmsweb.cern.ch/dqm/dqm-square/api?what=get_logs&id=dqm-source-state-run378981-hostdqmfu-c2b03-45-01-pid3057526&db=production

cmsbuild commented 5 months ago

cms-bot internal usage

cmsbuild commented 5 months ago

A new Issue was created by @saumyaphor4252.

@makortel, @smuzaffar, @Dr15Jones, @sextonkennedy, @antoniovilela, @rappoccio can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

perrotta commented 5 months ago

assign dqm

cmsbuild commented 5 months ago

New categories assigned: dqm

@rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini you have been requested to review this Pull request/Issue and eventually sign? Thanks

mmusich commented 5 months ago

just for the record, the PoolOutputModule of the HLTMonitor stream is configured such that it sends the right event content to be consumed at Tier-0:

      'keep *_hltEgammaGsfTracks_*_*',
      'keep *_hltMergedTracksForBTag_*_*',
      'keep *_hltMergedTracks_*_*',
      'keep *_hltPixelTracks_*_*',

thus the recoTrackExtras_*__HLT branches should be present in the input streamer files. I suspect something along the lines of https://github.com/cms-sw/cmssw/issues/39064.

Pranjal033 commented 5 months ago

We copied a few LSs from run 378981 (LS=455-465) in the playback region which were affected and reproduced this crash at playback, here is it's log file.

The information about how to reproduce the crash at lxplus can be found here and the streamers have been copied at this path : /eos/cms/store/group/comm_dqm/Collisions24_tempStreamers/

syuvivida commented 5 months ago

assign core

cmsbuild commented 5 months ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

saumyaphor4252 commented 5 months ago

Dear all, the issue is now becoming a showstopper for T0 operations with 100s of paused jobs in the HLTMonitor stream and Prompt Reco processing now stopped starting Run 378981: https://cms-talk.web.cern.ch/t/promptreco-is-paused-for-run2024b/38673.

Can the experts please look into it with a high priority.

Thanks and regards, Saumya (incoming ORM)

makortel commented 5 months ago

Can the experts please look into it with a high priority.

Starting now.

sextonkennedy commented 5 months ago

Thanks Matti. Note that from the Joint Ops meeting DQM reported that the track extra collection is really not in the file output from the HLT. So the process that really needs to be debugged is the HLT executable to determine why in a small fraction of the events the HLT does not manage to write out the TrackExtras.

makortel commented 5 months ago

The tarball for the PSet is available at

/afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024B/ProductNotFound/vocms014.cern.ch-27238-3-log.tar.gz

Looking at this reproducer, the hltrefittedForPixelDQM that throws the exception, consumes hltMergedTracks. According to EventContentAnalyzer the std::vector<TrackExtra> from hltMergedTracks is there, but with ProductID 1:3264, whereas the exception complains the std::vector<TrackExtra> with ProductID 1:3256 being missing.

makortel commented 5 months ago

Instrumenting the TrackProducerAlgorithm<T>::runWithTrack() for the case that throws the exception I see in the event 378985:224:220047314 the module's input (hltMergedTracks) has total of 326 tracks. First 316 tracks have algo hltIter0 and the last 10 have undefAlgorithm. For all tracks the TrackExtraRef has ProductID of 1:3256.

In the previous event (378985:224:219871457) in the file all the TrackExtraRefs have (the correct) ProductID 1:3264.

makortel commented 5 months ago

@cms-sw/hlt-l2 Is it possible to find out after the fact if the events 378985:224:219871457 and 378985:224:220047314 were processed by the same cmsRun process or by two different processes (either same or different node)?

slava77 commented 5 months ago

type tracking

dan131riley commented 5 months ago

Maybe the EventAuxiliary processGUID?

makortel commented 5 months ago

A recipe to rerun the HLT menu used to process these (kind of) events would also help in the investigation, even if it would not reproduce this problem exactly.

makortel commented 5 months ago

Did anything change e.g. in HLT menu or how HLT is being run in DAQ in Run 378985?

mmusich commented 5 months ago

A recipe to rerun the HLT menu used to process these (kind of) events would also help in the investigation, even if it would not reproduce this problem exactly.

Here's a possible recipe (not guaranteed to reproduce):

#!/bin/bash -ex

# CMSSW_14_0_4

hltGetConfiguration run:378985  \
  --globaltag 140X_dataRun3_HLT_v3 \
  --data \
  --no-prescale \
  --output full \
  --max-events -1 \
  --input /store/data/Run2024B/EphemeralZeroBias0/RAW/v1/000/378/985/00000/f5f542ca-b93e-46e9-a136-7e9f1740218a.root \
  > hlt.py

cat <<@EOF >> hlt.py
process.hltOutputFull.outputCommands = [
    'keep *',
    'drop *_hltSiPixelDigisLegacy_*_*',
    'drop *_hltSiPixelClustersLegacy_*_*',
    'drop *_hltSiPixelRecHitsFromLegacy_*_*',
    'drop *_hltEcalDigisLegacy_*_*',
    'drop *_hltEcalUncalibRecHitLegacy_*_*',
    'drop *_hltHbherecoLegacy_*_*',
    ]
@EOF

cmsRun hlt.py &> hlt.log

(the customization of outputCommands is needed because https://github.com/cms-sw/cmssw/issues/37207).

mmusich commented 5 months ago

Did anything change e.g. in HLT menu or how HLT is being run in DAQ in Run 378985?

adding to the ticket @mzarucki @trtomei @trocino @smorovic

mzarucki commented 5 months ago

Did anything change e.g. in HLT menu or how HLT is being run in DAQ in Run 378985?

Run 378985 was the second run with SB @ 13.6 TeV, however, the menu did not change wrt. the first stable collisions run 378981.

The menu deployed for the first 13.6 TeV collisions (for both runs) is the full p-p physics menu (V1.0) with L1T seeds and HLT paths enabled: /cdaq/physics/Run2024/2e34/v1.0.3/HLT/V2

smorovic commented 5 months ago

Hello, As far as I know, there was nothing different, globally, from DAQ side. One problem in this range is starting from 378981 until Sunday evening when we had a single FU with no GPUs detected by CMSSW and thus running with CPU only, if that could've had any effect.

This was fixed from 379067 with full reset of one GPU on the host (also, possibly, shorter periods on Saturday didn't have this problem when we attempted another fix but the problem returned quickly).

mmusich commented 5 months ago

One problem in this range is starting from 378981 until Sunday evening when we had a single FU with no GPUs detected by CMSSW and thus running with CPU only, if that could've had any effect.

There is a correlation it seems.
Judging from https://dmytro.web.cern.ch/dmytro/cmsprodmon/tier0.php the last run with this kind of failure was 379058, so it's consistent with the change occurring at 379067.

smorovic commented 5 months ago

That machine processed 300 Hz out of over 50 KHz (one of almost 200 FUs). Under the hypothesis that product IDs are different in case of CPU+GPU and CPU only, could also streamer INI files be different? Also INI file for a stream is picked up from one FU (process) for each stream at the beginning of the run (typically first one that arrives at each merging stage)..

makortel commented 5 months ago

Under the hypothesis that product IDs are different in case of CPU+GPU and CPU only

With the HLT recipe https://github.com/cms-sw/cmssw/issues/44643#issuecomment-2043314081 I can confirm this is the case. The std::vector<TrackExtra> produced by hltMergedTracks:HLTX has ProductID 2:3263 when ran on lxplus8-gpu, and ProductID 2:3255 when ran without GPU. (the IDs are off by 1 compared to the failed jobs, but the recipe doesn't seem to be exact replica of the online HLT).

I suspect the cause lies in the "module type resolver" approach for the @alpaka modules, where on a GPU node the @alpaka modules produce more transient data products than on a CPU node. The impact of this to the Refs was an oversight, and needs some careful thought now.

could also streamer INI files be different?

Could you remind me (e.g. pointing to code) what exactly is stored in the INI files? From what I recall, the INI files could well have a difference. My feeling is that the fix needs to be something else than keeping multiple INI files though.

smorovic commented 5 months ago

Under the hypothesis that product IDs are different in case of CPU+GPU and CPU only

With the HLT recipe #44643 (comment) I can confirm this is the case. The std::vector<TrackExtra> produced by hltMergedTracks:HLTX has ProductID 2:3263 when ran on lxplus8-gpu, and ProductID 2:3255 when ran without GPU. (the IDs are off by 1 compared to the failed jobs, but the recipe doesn't seem to be exact replica of the online HLT).

I suspect the cause lies in the "module type resolver" approach for the @alpaka modules, where on a GPU node the @alpaka modules produce more transient data products than on a CPU node. The impact of this to the Refs was an oversight, and needs some careful thought now.

could also streamer INI files be different?

Could you remind me (e.g. pointing to code) what exactly is stored in the INI files? From what I recall, the INI files could well have a difference. My feeling is that the fix needs to be something else than keeping multiple INI files though.

It is serialized here: https://github.com/smorovic/cmssw/blob/master/IOPool/Streamer/src/StreamSerializer.cc#L53

Keeping multiple INI files would be difficult. One copy is kept to save size, and is prepended to the streamer file at the end of merging files from all FUs. There is a checksum comparison at the level of one FU if INIs for any differences between copies, but later in the merging they aren't compared to each other. Format / streamer source also currently doesn't parse multiple INIs (I don't remember if now we can have them intertwined in the file, but I think I added that).

Until this is resolved, it would be good to disable CPU-only fallbacks in HLT menus (if that can be done).

fwyzard commented 5 months ago

assign heterogeneous

I suspect the cause lies in the "module type resolver" approach for the @alpaka modules, where on a GPU node the @alpaka modules produce more transient data products than on a CPU node. The impact of this to the Refs was an oversight, and needs some careful thought now.

cmsbuild commented 5 months ago

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

fwyzard commented 5 months ago

Until this is resolved, it would be good to disable CPU-only fallbacks in HLT menus (if that can be done).

I don't think this can easily be done. For example, setting

process.options.accelerators = [ 'gpu-nvidia' ]

actually breaks the menu :-/, because the "CPU only" modules used for the GPU vs CPU comparison fail to run.

fwyzard commented 5 months ago

Actually, it can be done calling

process.ProcessAcceleratorAlpaka.setBackend("cuda_async")

but I don't think this can be set in the menu itself. Maybe it could be added by hltd ?

Of course, if there are no GPUs the jobs will fail to start.

smorovic commented 5 months ago

Actually, it can be done calling

process.ProcessAcceleratorAlpaka.setBackend("cuda_async")

but I don't think this can be set in the menu itself. Maybe it could be added by hltd ?

Of course, if there are no GPUs the jobs will fail to start.

We could add it to the DAQ patch. It is defined in the RCMS software template, so we need to make a new one in DB and regenerate DAQ configuration with it. Hilton can pick it up from gitlab (but it doesn't have to), otherwise it'll just be added to CDAQ machines.

We could also put try/except around this line in case Alpaka accelerator is not in the menu (miniDAQ, emulator, unless this is added everywhere by confDB libs).

I'll follow up tomorrow on this.

saumyaphor4252 commented 5 months ago

Judging from https://dmytro.web.cern.ch/dmytro/cmsprodmon/tier0.php the last run with this kind of failure was 379058, so it's consistent with the change occurring at 379067.

Just to add that another occurance has been spotted in Run 379154 with multiple job failures: https://cmsweb.cern.ch/t0_reqmon/data/jobdetail/Express_Run379154_StreamHLTMonitor

mmusich commented 5 months ago

Just to add that another occurance has been spotted in Run 379154

indeed also for this run there has been non-zero rate for Status_OnCPU (OMS):

Screenshot from 2024-04-09 09-12-11

so, it does still fit the hypothesis.

smorovic commented 5 months ago

Yes, as far as I see, there was again problem with the same FU machine (fu-c2b05-03-01). I'll ask on-call to blocklist it.

silviodonato commented 5 months ago

Please note that we get crashes in the HLT farm related on GPU when we started to see rates from StatusOnCPU http://cmsonline.cern.ch/cms-elog/1209378

fwyzard commented 5 months ago

Please note that we get crashes in the HLT farm related on GPU when we started to see rates from StatusOnCPU

The would be expected in case of hardware GPU errors.

silviodonato commented 5 months ago

I tried to reproduce the errors of 378981 both from .root files (root://eoscms.cern.ch//eos/cms/store/group/tsg/FOG/error_stream/run378981/run378981_ls0307_index000158_fu-c2b05-03-01_pid*) and from .raw files (root://eoscms.cern.ch//eos/cms/store/group/tsg/FOG/error_stream_root/run378981/run378981_ls0307_index000096_fu-c2b05-03-01_pid*) with PIDs = 1666323, 1666247, 1666296, 1666276 taken from the F3Mon crashes, and I didn't manage to reproduce any error.

Same result for errors in 379154. In this case I used directly /store/error_stream/run379154/run379154_ls0642_index000025_fu-c2b05-03-01_pid*.raw with PIDs = 2772474, 2772502, 2772522, 2772553.

Note that all 8 errors (4 from 378981 and 4 from 379154) contains 'cudaErrorECCUncorrectable': 'uncorrectable ECC error encountered'!

Everything seems to confirm that it is an hardware error on GPU, as commented above, which cannot be reproduced easily. So I will not open a separated CMSSW issue (unless anyone prefer to have it)

makortel commented 5 months ago

Continuing from https://github.com/cms-sw/cmssw/issues/44643#issuecomment-2043600494

After more experimentation I confirm the problem is caused by the combination of

  1. A mixture of CPU-only and CPU+GPU processes
  2. Alpaka modules producing different sets of data products depending on the backend
    • All backends of an Alpaka-based module produce the same set of persistent data products (which is a requirement). On a GPU backend the device-side data products are transient, and are implicitly copied to the host (as the persistent data products). On the CPU backend the persistent data products are produced directly.
  3. Use of streamer files, the way the data is split in INI and event data files, and how those files are handled in the online system
    • I.e. assuming the INI files (and the metadata they contain) are identical for all processes, and all but one of the INI files can be discarded.
      • Essentially the points 1+2 above break the assumption

I have a small reproducer to mimic the behavior of 1+2, that demonstrates the different processes result in different ProductIDs. With the ROOT file format, when two files from different processes are merged into one, the Refs stay functional, because the framework keeps the necessary metadata along the Events. Also with streamer data format, if the two different processes produce the "full streamer files", i.e. each file contain the INI header and the event data, the Refs stay functional (again, the framework has the necessary metadata available). I'm still in progress to replicate the behavior when only the event data parts of the two streamer files are concatenated under one INI header.

Also to note explicitly, the problem is limited to the use of Refs (and Ref-like things).


I believe I can cook up a quick (and ugly) workaround addressing point 2, for the specific case of the present HLT menu and farm. This workaround is not a good solution for longer term, but would help us to move forward while we work on a good long-term solution.

For the long term, the framework team would want to address the point 3. We would suggest to evolve the streamer file format in one of the following ways:

  1. We add a new section for process-level metadata to the "event data portion" of the streamer file format.
    • For the purpose of resolving this particular issue this new section would contain the BranchIDLists, but could be used later to add more process-level metadata that can genuinely differ between the processes (e.g. https://github.com/cms-sw/cmssw/issues/30044)
  2. We associate a hash (or "unique ID") to each INI header, and record the hash as part of the Event data,
    • Would require the online system to collect all the unique INI files, and to add them to the final streamer file
    • Would allow recording (accidental or intentional) ParameterSet differences among the processes
  3. We write the necessary metadata as part of the Event data
    • Might be the option requiring the least amount of effort to implement
    • Would take the most disk space of these 3 options

There is also an additional possibility, kind of between the workarounds and the long-term options, that would allow to keep the present streamer file format. The online system would identify the unique INI files, and the event data files associated to each of the unique INI file. Each unique INI file would be concatenated with the corresponding event data files to form one streamer file (i.e. in this particular case there would be two streamer files instead of one). I'd imagine this option to be the least favorable though because of the implications to bookkeeping when data is transferred from P5 to Tier0 (but I also don't know much about that).

makortel commented 5 months ago

I'm still in progress to replicate the behavior when only the event data parts of the two streamer files are concatenated under one INI header.

Now I can also reproduce the failure in de-referencing Ref in this particular case.

fwyzard commented 5 months ago

Of the three options

  1. We add a new section for process-level metadata to the "event data portion" of the streamer file format.
  2. We associate a hash (or "unique ID") to each INI header, and record the hash as part of the Event data,
  3. We write the necessary metadata as part of the Event data

number 2. seems the one with the least impact on how the current system work, while adding a bit of robustness.

It does require a change to the mergers (compare the checksums of the INI headers and keep all different ones, instead of keeping just the first one) - but such comparison looks like something we should probably implement anyway, as a safety check ?

smorovic commented 5 months ago
  1. We add a new section for process-level metadata to the "event data portion" of the streamer file format.

If branch lists don't take a lot of space (up to a few % would be ok) compared to the event size (which is order of MB in our case), then this is a good way. Also there were other use cases for the process block. Maybe it could be optional (per stream, i.e. output module) for cases like heavy ion where we are output bandwidth limited.

  • For the purpose of resolving this particular issue this new section would contain the BranchIDLists, but could be used later to add more process-level metadata that can genuinely differ between the processes (e.g. Store "architecture" in event provenance #30044)
    1. We associate a hash (or "unique ID") to each INI header, and record the hash as part of the Event data,

This is best in terms of output size and it should be not too complex to do in the merging system. We could just lump together and prepend any INI files that don't have identical hash (assuming they don't differ from each host).

  • Would require the online system to collect all the unique INI files, and to add them to the final streamer file
  • Would allow recording (accidental or intentional) ParameterSet differences among the processes
    1. We write the necessary metadata as part of the Event data

This could be unfeasible because of size which we would add to each event (and we are often bandwidth limited in the merging and transfers to Tier0). For current collision menus INI files are ~7 MB. They aren't compressed, but when i do this it drops to 600 kB, it's still large. Independently, we could benefit from INI compression and having a hash for integrity checks.

  • Might be the option requiring the least amount of effort to implement
  • Would take the most disk space of these 3 options

There is also an additional possibility, kind of between the workarounds and the long-term options, that would allow to keep the present streamer file format. The online system would identify the unique INI files, and the event data files associated to each of the unique INI file. Each unique INI file would be concatenated with the corresponding event data files to form one streamer file (i.e. in this particular case there would be two streamer files instead of one). I'd imagine this option to be the least favorable though because of the implications to bookkeeping when data is transferred from P5 to Tier0 (but I also don't know much about that).

This is redesign of what we have in terms of bookkeepping and file handling, both within the merging chain and also concerning the handshake with Tier0 and their reprocessing jobs. More of a long-term than mid-term option and 1 or 2 seem better options in that sense.

fwyzard commented 5 months ago

2. We associate a hash (or "unique ID") to each INI header, and record the hash as part of the Event data,

This is best in terms of output size and it should be not too complex to do in the merging system. We could just lump together and prepend any INI files that don't have identical hash (assuming they don't differ from each host).

In principle INI files can be different for jobs on the same host (e.g. 1 GPU is working and the other GPU is not working) or for jobs on different hosts (e.g. one host has all GPUs working and another has no GPUs). So the logic should be implemented at all merger levels.

smorovic commented 5 months ago
  1. We associate a hash (or "unique ID") to each INI header, and record the hash as part of the Event data,

This is best in terms of output size and it should be not too complex to do in the merging system. We could just lump together and prepend any INI files that don't have identical hash (assuming they don't differ from each host).

In principle INI files can be different for jobs on the same host (e.g. 1 GPU is working and the other GPU is not working) or for jobs on different hosts (e.g. one host has all GPUs working and another has no GPUs). So the logic should be implemented at all merger levels.

Agreed, it can happen also between processes.

More thinking about it, with the CPU fallback disabled I don't see a need for a workaround (options 1-2...). For HLT we can assume that we will run the whole cluster either with GPUs or without GPUs (at least in Run 3). Only if there is problem with hardware (e.g. new FUs arrive, but, for some reason, GPUs on new hosts can not be used) then we could have a problem.

fwyzard commented 5 months ago

More thinking about it, with the CPU fallback disabled I don't see a need for a workaround (options 1-2...). For HLT we can assume that we will run the whole cluster either with GPUs or without GPUs (at least in Run 3).

Sorry, who decided that this is how we will be running for the rest of Run 3 ?

I understood the "fix" as a quick'n'dirty hack to prevent the problem with Tier-0 jobs while a better solution is being investigated.

Not as a permanent solution for the rest of Run 3.

smorovic commented 5 months ago

More thinking about it, with the CPU fallback disabled I don't see a need for a workaround (options 1-2...). For HLT we can assume that we will run the whole cluster either with GPUs or without GPUs (at least in Run 3).

Sorry, who decided that this is how we will be running for the rest of Run 3 ?

I understood the "fix" as a quick'n'dirty hack to prevent the problem with Tier-0 jobs while a better solution is being investigated.

Not as a permanent solution for the rest of Run 3.

OK, noted, I take it back. It was my naive assumption that we will keep homogenous setup in Run 3, but, if that is not the case, we need a long term solution.

smorovic commented 5 months ago

I made estimate of additional bandwidth for option 1 on the current HLT farm.

After removing PSetMap writing in IOPool/Streamer/src/StreamSerializer.cc, size of uncompressed INI files among streams with the HLT menu of run 378981 is much lower, between 54K and 66K (largest one being HLTMonitor). This seems to come from branch ID lists (sd.setBranchIDLists(branchIDLists);) while thinnedAssociationsHelper doesn't have any effect on size. Compressing the HLTMonitor file with tar cjf, I'm getting around 34K.

Taking into account we run about 200 FUs with 8 processes each in current configuration (this will increase by ~20% with new FUs), if we had one process block per stream file of a single process (we write one such file every lumisection), there would be 1600 such files written per stream per lumisection, or almost 69 Hz per stream. Let's assume 70 streams each lumisection, which is less than we have now in ramp up. We are probably running with less streams overall. Note: if streams have no events in a lumisection, process block could also be skipped.

69 Hz 70 0.034 MB => 164 MB/s. Probably around 200 MB/s with new FUs. I think this looks acceptable compared to 10 to 12 GB/s expected this year for pp runs.

makortel commented 5 months ago

Thanks for the feedback. With @Dr15Jones and @wddgit we started to look into the details of "option 2" (and we will continue on this route unless decided otherwise, or we hit a blocker).

We came to the conclusion that the Adler32 checksum stored in the InitMsgView https://github.com/cms-sw/cmssw/blob/1471374377d3423894ee8a8d29b95041c9cfeb38/IOPool/Streamer/interface/InitMessage.h#L89 would do the necessary job to disambiguate the different Init messages at the granularity needed for the framework metadata. (for reference, that checksum corresponds the serialized data of SendJobHeader, that includes https://github.com/cms-sw/cmssw/blob/e4277131533a7943946bfc21d1d6b858a83128a6/IOPool/Streamer/src/StreamSerializer.cc#L67-L69 )

We have two questions to DAQ at this stage

smorovic commented 5 months ago

) We have two questions to DAQ at this stage

  • (more out of curiosity) How would the merging identify which INI files have different content (checksum)? Would it e.g. rely on file naming (in which case the aforementioned checksum would have to be included in the final INI file name), or read the checksum from the INI file, or calculate the checksum of the full INI file on the fly?

This will have to be discussed more widely in the DAQ group since one part of the merging chain (all after the FU) is not maintained by me and we should agree on the approach (and feasibility). But, in my opinion, the most straightforward way would be to have a full INI checksum (not adler32 of the payload) calculated on FUs, then and appended to the file name to disambiguate and preserve each variant through the merging chain. We now use files of the format runX_ls0000_streamY_pidZ.ini. Later, as it is propagated, it becomes, for example, runX_ls0000_streamY_fu-c2b02-41-01.ini and similar. Another substring, like _checksum98798787 could be added to the file.

  • Do you have preference how the final unique INI files and the event data files are concatenated together? Would it be reasonable to require all the InitMessages (INI files) first, followed by all the event data?

Yes, INI chunks would definitively come before any events, as now.

makortel commented 5 months ago

I believe I can cook up a quick (and ugly) workaround addressing point 2, for the specific case of the present HLT menu and farm. This workaround is not a good solution for longer term, but would help us to move forward while we work on a good long-term solution.

This workaround is in

makortel commented 5 months ago

This will have to be discussed more widely in the DAQ group since one part of the merging chain (all after the FU) is not maintained by me and we should agree on the approach (and feasibility).

Ok. Can you tell anything about the timescale for a decision? I guess if this approach would be deemed unfeasible in the end, we'd go with the option 1 ("process-level metadata section in the event data files").

But, in my opinion, the most straightforward way would be to have a full INI checksum (not adler32 of the payload) calculated on FUs, then and appended to the file name to disambiguate and preserve each variant through the merging chain.

Ok. I think also this should be fine from the framework perspective.