Open haozturk opened 3 years ago
This is the backfill: https://cmsweb.cern.ch/reqmgr2/fetch?rid=haozturk_task_TSG-Phase2HLTTDRWinter20GS-Backfill-00276__v1_T_210223_202244_6379
In order to adjust the job splitting, I used the ReqMgr GUI and divided the events_per_job
by 8. Another details is that I used the hepcloud
team instead of backfill
. Since backfill agents are at CERN and cannot work with the T3_US_ANL
site.
This backfill had some issues due to wallclock time constraints. This time we will try again by diving the events_per_job
by 20, thus we'll get 2000/20=100
events per job and monitor its status.
New backfill: https://cmsweb.cern.ch/reqmgr2/fetch?rid=haozturk_task_TSG-Phase2HLTTDRWinter20GS-Backfill-00276__v1_T_210324_191726_5368
We assigned one production workflow to ANL https://cmsweb.cern.ch/reqmgr2/fetch?rid=haozturk_task_TSG-SnowmassWinter21wmLHEGEN-00008__v1_T_210330_084302_349 The job splitting is adjusted such that the events_per_job
is divided by 30: 9000 --> 300
These workflows have failed due to various issues. Dirk applied some fixes on the site end, so we resubmitted the workflows:
The prod workflow: https://cmsweb.cern.ch/reqmgr2/fetch?rid=haozturk_task_TSG-SnowmassWinter21wmLHEGEN-00008__v1_T_210421_122144_910 Backfill: https://cmsweb.cern.ch/reqmgr2/fetch?rid=haozturk_task_TSG-Phase2HLTTDRWinter20GS-Backfill-00276__v1_T_210421_121706_1714
Impact of the new feature HPC sites - T3_US_ANL in particular
Is your feature request related to a problem? Please describe. There is a new HEPCloud site for which custom settings are required in workflow assignment. One of these custom settings is job splitting. This site requires job with 6h wall time - not more. That's why we need to adjust the job splitting such that we create smaller jobs that can fit into this site.
Describe the solution you'd like Create a tool/script which helps us to customize job splitting in workflow assignment.
Describe alternatives you've considered None
Additional context Currently the issue is discussed on Slack, I will update the issue as we proceed. @z4027163 @amaltaro @todor-ivanov @drkovalskyi FYI