Open belforte opened 2 years ago
I checked the logs of the task, this is the good old secondary dataset matching:
I even found an old issue where this was reported: https://github.com/dmwm/CRABServer/issues/5244
Original Ticket and PR:
https://github.com/dmwm/CRABServer/issues/4861
https://github.com/dmwm/CRABServer/pull/4934
Matthias is says the algorithm is N^2, but that does not include this line lumis & secinfos['lumiobj']
.
thanks @mmascher
IIUC the implementation of the &
in lumis & secinfos['lumiobj']
is here:
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/DataStructs/LumiList.py#L184-L212
It contains three nested for
loops. Considering these are nested inside two other for loops I can see how and why things can get pretty bad.
I think things can be improved considering &
is trying to build the common runLumiList, but CRAB only needs to know if there is an overlap.
yeah.. I am quite confident that a leaner algorithm can be identified. But so fare this is quite rare. I am a bit more puzzled by Splitter taking long time but it is possible that it will not happen anymore after #6691
Sometimes DBS DataDiscovery or Splitter run for hours. While rare, there is no clear understanding of why nor control of things when it happens. So we are exposed to problems in case of changing usage patterns. Recent example with DBDataDiscovery was 5+ hours to associate files with corresponding secondary dataset parents in this task
210916_135021:sbaradia_crab_EmulatedTagAndProbe_DYToLL_M-50_112X_mcRun3_2021_realistic_v16-v2_CMSSW_11_2_2_patch1
Should look for ways to speed the code, or return error and force saner inputs or .. whatever. but would be good to be able to put a time limit on every slave action.