JeffersonLab / hps-mc

HPS MC toolkit
1 stars 6 forks source link

Combine events in MadGraph execution so requested number are generated #201

Closed JeremyMcCormick closed 8 months ago

JeremyMcCormick commented 4 years ago

MadGraph often has event underflow when generating A-prime events.

Run MG until enough events are generated, and then use this script to combine them:

https://github.com/eyvindn/MMAPS-madgraph/blob/master/new_madgraph_binaries/Template/LO/bin/internal/merge.pl

This should correctly output the parameters of the combined files.

cbravo135 commented 3 years ago

I think this is just a seed related issue. I think a much more simple approach to this would be to have a tool that could help you identify the job_ids which had a "bad seed" and automatically change them to a new unique one and then resubmit the new jobs that need a new seed. I do know that Takashi used to just do this by hand.

JeremyMcCormick commented 3 years ago

I think this is just a seed related issue. I think a much more simple approach to this would be to have a tool that could help you identify the job_ids which had a "bad seed" and automatically change them to a new unique one and then resubmit the new jobs that need a new seed. I do know that Takashi used to just do this by hand.

I would prefer some procedure that would run the A-prime generator within the python component until it reached the requested number of events or looped too many times. These jobs only have a single seed, so you would indeed need to have a different one for the number of output events to change; I'm not sure what scheme should be used there but it should be deterministic like adding a fixed number to the job's seed so that results are reproducible. Just adding 1 to the seed would not be a good solution unless care was taken in the template to separate the job seed values enough so that they weren't duplicated by another job. Once enough events were generated within the single job, the generated files could all be concatenated together with the proper header information included by using the script I linked. That strikes me as a better approach than having to resubmit jobs over and over again with different seeds, which seems like it would be annoying for the person doing the submitting. And it also isn't clear how this would work since the seed value is already listed in the jobs.json file. Would that be overridden or replaced? This seems like it would be a confusing situation.

Or you could just generate N additional A-prime events in the first place for your entire sample based on a rough estimate of the underflow. But sometimes it is quite considerable e.g. you request 10000 events and get ~1500 out from the job. This could be something we should quantify better through production (@tongtongcao may already have a good idea about it and may even have some rough numbers).

Though not optimal, it could be that requesting a smaller number of events in each job (like 1000 instead of 10000) and then concatenating might work, but of course this multiplies the overall number of batch jobs required quite considerably.

It would also be good to discover if there are "good" and "bad" seeds. We can then take care not to use bad seeds. :)

tongtongcao commented 3 years ago

According to my experience, the number of produced ap events, not just depends on seed, but also ap mass. For low ap masses, it hardly reaches 10k required events, no matter what seed we set. Here is an example for ap = 50 MeV/c^2 and energy = 4.55 GeV. It shows distributions for number of produced events with different seeds. image The case for ap is special, and we should do concatenation. For other physics channels, what we did before is like what Cameron suggested. We try different seeds, until required 10 events is reached for a job. It means that we can be sure that all MG files include 10k events for rad, tritrig and wab, so that it is convenient for us to do normalization for final samples and do statistical analysis at various levels in MC chains.