As report in email: Problem with HIN pLHE+GS request (issue in both production and McM validation)
Validation in McM always end up in 1 LS only. This will blind us if we have issue at the lumisection boundary. We should add customization to cmsDriver, defining the event_per_lumi.
It seems the crash happens at the LS boundary. This, I reported before in early 2019 (*), and we fixed it already. I again put detail in
https://github.com/cms-sw/cmssw/issues/30070
Sorry to take times on debugging. I just spot it when I try to split pLHE and GS out from each other, and error report starts to become available in computing side.
(**)
24-Jan-2020 23:52:06 UTC Initiating request to open file file:HIN-HINPbPbAutumn18pLHE-00002.root
24-Jan-2020 23:52:07 UTC Successfully opened file file:HIN-HINPbPbAutumn18pLHE-00002.root
%MSG-w LogicError: Pythia8HadronizerFilter:generator@beginRun 24-Jan-2020 23:52:09 UTC Run: 1
::getByLabel: An attempt was made to read a Run product before endRun() was called.
The product is of type 'LHERunInfoProduct'.
The specified ModuleLabel was 'source'.
The specified productInstanceName was ''.
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 24-Jan-2020 23:52:48.308 UTC
....
Begin processing the 10000th record. Run 1, Event 10000, LumiSection 1 on stream 3 at 24-Jan-2020 23:55:50.283 UTC
24-Jan-2020 23:55:50 UTC Closed file file:HIN-HINPbPbAutumn18pLHE-00002.root
As report in email: Problem with HIN pLHE+GS request (issue in both production and McM validation)
Validation in McM always end up in 1 LS only. This will blind us if we have issue at the lumisection boundary. We should add customization to cmsDriver, defining the event_per_lumi.
Detail email:
Hi All, [adding GEN conveners and Justinas]
I think I start to get idea what is the problem on HIN pLHE+GS in 10_3. Example of error report can be found in https://cms-unified.web.cern.ch/cms-unified/report/cmsunified_task_HIN-HINPbPbAutumn18pLHE-00005__v1_T_200429_213517_9970
It seems the crash happens at the LS boundary. This, I reported before in early 2019 (*), and we fixed it already. I again put detail in https://github.com/cms-sw/cmssw/issues/30070
One question I try to answer is why we cannot spot in McM. This is because when McM make pLHE + GS validation together, it will make script like https://cms-pdmv.cern.ch/mcm/public/restapi/chained_requests/get_test/PPD-chain_HINPbPbAutumn18pLHE_flowHINPbPbAutumn18GS_flowHINPbPbAutumn18DRNoPU-00002 and CMSDriver does not contain event_per_lumi customisation. So, a job runs on condor will contain only 1 LS. I confirm this by looking on validation report email of pdmvserv. Note that the event_per_lumi will be defined in the workflow level only.
The problem will be spot easily if pLHE done first in production, and then GS validation run on LHE dataset, e.g. in my try on https://cms-pdmv.cern.ch/mcm/requests?member_of_chain=PPD-chain_HINPbPbAutumn18pLHE_flowHINPbPbAutumn18GS_flowHINPbPbAutumn18DRNoPU-00001&page=0&shown=127
Sorry to take times on debugging. I just spot it when I try to split pLHE and GS out from each other, and error report starts to become available in computing side.
Best, Phat
(*) https://github.com/cms-sw/cmssw/issues/25708
(**) 24-Jan-2020 23:52:06 UTC Initiating request to open file file:HIN-HINPbPbAutumn18pLHE-00002.root 24-Jan-2020 23:52:07 UTC Successfully opened file file:HIN-HINPbPbAutumn18pLHE-00002.root %MSG-w LogicError: Pythia8HadronizerFilter:generator@beginRun 24-Jan-2020 23:52:09 UTC Run: 1 ::getByLabel: An attempt was made to read a Run product before endRun() was called. The product is of type 'LHERunInfoProduct'. The specified ModuleLabel was 'source'. The specified productInstanceName was ''.
%MSG Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 24-Jan-2020 23:52:48.308 UTC .... Begin processing the 10000th record. Run 1, Event 10000, LumiSection 1 on stream 3 at 24-Jan-2020 23:55:50.283 UTC 24-Jan-2020 23:55:50 UTC Closed file file:HIN-HINPbPbAutumn18pLHE-00002.root