CMSCompOps / WmAgentScripts

CMS Workflow Team Scripts
7 stars 51 forks source link

pLHEGEN requests getting wrong number of events #970

Open jordan-martins opened 2 years ago

jordan-martins commented 2 years ago

Hi all,

this is a thread to investigate something odd happening with the number of events in the pLHEGEN requests in 20UL campaigns. I suspect that something is not tuned while doing the stepchain conversion. It probably messes with the filtereff from the first task (e.g. the pLHEGEN one). This is a task that has 2 sequences within it. one to make a lhe to root file and the filter eff is 1 always. Then we have the GEN sequence that can have a GEN filter and therefore not always have filtereff 1.

Some requests to serve as example:

https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL16pLHEGEN-00080 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL16pLHEGEN-00081 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL16pLHEGENAPV-00080 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL16pLHEGENAPV-00081 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL17pLHEGEN-00081 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL17pLHEGEN-00082 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL18pLHEGEN-00083 https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL18pLHEGEN-00086

Could you kindly investigate this, please?

Thanks, Jordan

github-actions[bot] commented 2 years ago

Thanks for submitting your first issue

jordan-martins commented 2 years ago

Hi, just as some concrete math

https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIISummer20UL16pLHEGEN-00084

It was requested 360k from our side, then the wf closed-out with 13k thinking that it was the correct amount. The LHE eff = 1, GEN eff = 0.0362. So, ~360k*0.0362 = ~13k. It would have worked if we had place in McM the total number of events that it has in LHE files, then when it got the step chain conversion with filter_eff it would produce the ~360k events. However, this approach (I think!) would not work if the wf would run in task chain, since would be impossible to match for the first task the number of events requested (or, perhaps, the system would be smart enough to project the number of events based on the filter eff, regardless the number we pass (?)).

Check this one for example: https://dmytro.web.cern.ch/dmytro/cmsprodmon/workflows.php?prep_id=task_BPH-RunIIFall18pLHE-00004 this is different from the above because the plhe and GS happen in different campaigns. Then, the user places the correct number in pLHE 16M, and in the GS the user places in McM the correct value given the filter eff (0.1143), which gives ~1.8M of possible output. When the full wf is sub, everything happens as expected.

In any case, I want to check on our side what would happen if we make the request for the pLHE requiring the number of events from LHE files.

I hope I could explain what is happening... not sure... Please let us know what you think.

Thanks, Jordan FYI @amaltaro @haozturk

jordan-martins commented 2 years ago

Hi @haozturk , can you confirm that in order for you to evaluate the number of events you always multiply by a given filter eff? Best, Jordan

haozturk commented 2 years ago

Hi @jordan-martins in order to better analyze the situation, I submitted the same workflow in taskchain mode. With that, we'll understand if the issue is related to the conversion. We'll also have more data to assess. It's a bit hard to do it now since the previous workflows are archived. Anyways, my investigations continue w/o waiting the result of the taskchain request. Once I have something concrete, I'll share it w/ you.

haozturk commented 2 years ago

I want to share one observation: For the request from 2018, the LHE output was kept. So you can see that LHE output contains NumRequestedEvents/FilterEfficiencyOfGen many events (16M). So, it seems GEN takes this output as input and filters it w/o generating anything and produces the right amount of events. In the current requests, I cannot see how many events the LHE datasets contain, since keepOutput is False. Taking this workflow as example, if it's 360k, instead of 360k/FilterEfficiencyOfGen, this might explain the situation. If we can capture this information w/ the new workflow, it might conclude this hypothesis.