dmwm / CRABServer

16 stars 38 forks source link

make sure there is no skipEvent in jobs PSet #7349

Closed belforte closed 1 year ago

belforte commented 2 years ago

with reference to https://cms-talk.web.cern.ch/t/partial-processing-with-automatic-splitting-in-crab-job/13093/12 we found that CRAB wrapper, at PSet tweaking time, does not make sure that skipEvents is set to zero. In case the input PSet has a non-zero value there, it would be propagated and mess up processing. CRAB needs to make sure that all lumis indicated in input to a file will be processed.

IN this are it is more important to be careful than to push a quick fix, all in all we have been running like this since ever, as far as I can see.

@mapellidario let me know if you'd like to be involved as 'getting to know the details of job wrapper"

mapellidario commented 2 years ago

Yes, I think that I could benefit from being involved, thanks Stefano!

mapellidario commented 2 years ago

First step: I replicated the behavior.

more info: [1], [2] and [3] [1] link: https://cmsweb-test11.cern.ch/crabserver/ui/task/220727_124006:dmapelli_crab_20220727_144002 original pset: ```plaintext process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root')) ``` job output ```plaintext == CMSSW: 27-Jul-2022 12:43:20 UTC Successfully opened file root://cmsxrootd-site.fnal.gov//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root == CMSSW: Begin processing the 1st record. Run 1, Event 195301, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:48.566 UTC == CMSSW: Begin processing the 2nd record. Run 1, Event 195302, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:48.584 UTC == CMSSW: Begin processing the 3rd record. Run 1, Event 195303, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:48.585 UTC [...] == CMSSW: Begin processing the 298th record. Run 1, Event 195598, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:50.877 UTC == CMSSW: Begin processing the 299th record. Run 1, Event 195599, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:50.878 UTC == CMSSW: Begin processing the 300th record. Run 1, Event 195600, LumiSection 652 on stream 0 at 27-Jul-2022 12:43:50.879 UTC == CMSSW: 27-Jul-2022 12:43:51 UTC Closed file root://cmsxrootd-site.fnal.gov//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root ``` [2] link: https://cmsweb-test11.cern.ch/crabserver/ui/task/220727_151454%3Admapelli_crab_20220727_171452 original pset: ```plaintext [...] process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root'), skipEvents=cms.untracked.uint32(290)) [...] ``` job output ```plaintext == CMSSW: 27-Jul-2022 17:17:33 CEST Successfully opened file file:/storage/gpfs_tsm_cms/cms/disk/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root == CMSSW: Begin processing the 1st record. Run 1, Event 195591, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:50.981 CEST == CMSSW: Begin processing the 2nd record. Run 1, Event 195592, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.444 CEST == CMSSW: Begin processing the 3rd record. Run 1, Event 195593, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.445 CEST == CMSSW: Begin processing the 4th record. Run 1, Event 195594, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.447 CEST == CMSSW: Begin processing the 5th record. Run 1, Event 195595, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.448 CEST == CMSSW: Begin processing the 6th record. Run 1, Event 195596, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.449 CEST == CMSSW: Begin processing the 7th record. Run 1, Event 195597, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.450 CEST == CMSSW: Begin processing the 8th record. Run 1, Event 195598, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.452 CEST == CMSSW: Begin processing the 9th record. Run 1, Event 195599, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.453 CEST == CMSSW: Begin processing the 10th record. Run 1, Event 195600, LumiSection 652 on stream 0 at 27-Jul-2022 17:17:51.454 CEST == CMSSW: 27-Jul-2022 17:17:51 CEST Closed file file:/storage/gpfs_tsm_cms/cms/disk/store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root ``` [3] link: https://cmsweb-test11.cern.ch/crabserver/ui/task/220727_153633:dmapelli_crab_20220727_173631 original pset: ```plaintext [...] process.source = cms.Source("PoolSource", fileNames = cms.untracked.vstring('root://cms-xrd-global.cern.ch///store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root'), skipEvents=cms.untracked.uint32(0)) [...] ``` job output: ```plaintext == CMSSW: 27-Jul-2022 15:41:56 UTC Initiating request to open file root://xrootd.echo.stfc.ac.uk//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root == CMSSW: 27-Jul-2022 15:42:02 UTC Successfully opened file root://xrootd.echo.stfc.ac.uk//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root == CMSSW: Begin processing the 1st record. Run 1, Event 195301, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:16.788 UTC == CMSSW: Begin processing the 2nd record. Run 1, Event 195302, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:16.802 UTC == CMSSW: Begin processing the 3rd record. Run 1, Event 195303, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:16.804 UTC [...] == CMSSW: Begin processing the 298th record. Run 1, Event 195598, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:18.076 UTC == CMSSW: Begin processing the 299th record. Run 1, Event 195599, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:18.077 UTC == CMSSW: Begin processing the 300th record. Run 1, Event 195600, LumiSection 652 on stream 0 at 27-Jul-2022 15:42:18.078 UTC == CMSSW: 27-Jul-2022 15:42:18 UTC Closed file root://xrootd.echo.stfc.ac.uk//store/mc/HC/GenericTTbar/AODSIM/CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/00000/8ADD04E5-1776-E711-A1BA-FA163E6741E0.root ```
mapellidario commented 2 years ago

After some time wondering why I did not see the effects of my changes to TweakPSet.py, I realized I should

I submitted a new task 220728_140056:dmapelli_crab_20220728_160054 and it looks good [1]!

[1] I will add all the details in the PR that I am about to open.

belforte commented 2 years ago

of course you need to ./updateTMRuntime.sh before you start the TW process :-)