changes in weightNames for PS for UL samples

mariadalfonso commented 3 years ago

As reported here https://hypernews.cern.ch/HyperNews/CMS/get/generators/4819/1/1/1.html due to the changes in weightNames for parton shower weights in newer CMSSW need to adjust NANO-AOD weights for all UL samples.

@agrohsje

agrohsje commented 3 years ago

the change comes with pythia 8.240. let me add @dteague and @kdlong.

kdlong commented 3 years ago

For refactored weights I have a fix already: https://github.com/kdlong/cmssw/blob/ImprovedParsingNano/SimDataFormats/GeneratorProducts/src/PartonShowerWeightGroupInfo.cc#L19-L72

For the current Nano production one needs to parse the weights and detect which indices to use from that. Should be good enough to see if they have a colon and an equals sign in them.

mariadalfonso commented 3 years ago

@kdlong Thanks one question: this fix act at which stage of the production ?

kdlong commented 3 years ago

@mariadalfonso, it goes directly in the weightproducts. They aren't really tied to a particular data tier. In the current setup, they are either produced at the GEN step and stored in the file, or recreated before running the NanoAOD if they are missing in the file read in. They could even be added in the re-MiniAOD.

kdlong commented 3 years ago

Hi all,

I'm going to take another look at improving the NanoGen dqm in the coming days. In particular, it should be possible to add plots for the LHE weights to see that the number of weights and their distribution is consistent across productions.

Is there any way to run a NanoGen workflow when testing a GEN-->NanoAOD sample in mcm during central production? If we always ran NanoGEN dqm before submitting a sample, we could catch these problems before samples are produced.

Best,

Kenneth

On Tue, Oct 6, 2020 at 5:05 PM mariadalfonso notifications@github.com wrote:

@kdlong https://github.com/kdlong Thanks one question: this fix act at which stage of the production ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cms-nanoAOD/cmssw/issues/546#issuecomment-704333701, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTCN4DJMKNPX27ZSADL6OLSJMW2NANCNFSM4SF6ZT6Q .

agrohsje commented 3 years ago

Hi Kenneth, we have the MCCM development at 1:30 tomorrow. If you are available for 15', say e.g. from 2:45 to 3:00 or at the beginning, we could discuss there. Let me know, Alexander.

mariadalfonso commented 3 years ago

@kdlong and @agrohsje was there any decision on how to tackle those PS weights in short term ? Please update us and make a PR in master if it's appropriate

kdlong commented 3 years ago

Should just need to change this line:

https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc#L931-L935

The other things to search for are:

fsr:murfac=2.0 fsr:murfac=0.5 isr:murfac=2.0 ifsr:murfac=0.5

this would find all the appropriate weights, but maybe not in the same order as they are now. You could set a flag and adjust the doc string accordingly.

mariadalfonso commented 3 years ago

@kdlong @agrohsje if the fix is understood, who is making the PR ?

mariadalfonso commented 3 years ago

@dteague let us know how is going and the time estimate to deliver the fix.

dteague commented 3 years ago

I talked with Kenneth earlier today to clear up some issue. I have a working code base, but I'm testing/fixing problems that crop up. I should have a PR open before the end of the day

mariadalfonso commented 3 years ago

@ dteague thanks, I saw that you have a branch linked here. what is the status, are you waiting for feedback from @agrohsje @kdlong

@swertz since you are already made modification on those PSweights, do you have some comment ?

dteague commented 3 years ago

what is the status, are you waiting for feedback from @agrohsje @kdlong

At this point, comments would be appreciated. I tried to make the code as general as possible, but I ran into a problem of one of my test files was missing the names in the LumiHeader meaning new code had to be put in as a backup in case this happens. It might be more sensible to go with a more hardcoded approach, but a code review can resolve that question.

As for testing, I tested on about 7 files from this sheet, all of different generators.

agrohsje commented 3 years ago

Hi @dteague . Why does the format depend on the input files? This happens inside P 8.240 and is added on top of LHE. Which problems you see?

dteague commented 3 years ago

Hi @dteague . Why does the format depend on the input files? This happens inside P 8.240 and is added on top of LHE. Which problems you see? I'm not sure if different generators are using different pythia versions or something of the like, but the format for the PS weights is different for certain files.

The default PS weight order is

nominal
Baseline
isrRedHi
fsrRedHi
isrRedLo
fsrRedLo
isrDefHi
fsrDefHi
isrDefLo
fsrDefLo
...

While for certain files, eg /TTJets_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer19UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM

nominal
Baseline
fsr:murfac=0.707
fsr:murfac=1.414
fsr:murfac=0.5
fsr:murfac=2.0
fsr:murfac=0.25
fsr:murfac=4.0
...

I don't know the exactly what's causing the difference, but the format differences are such that the main variations are not in the spot in the weight vector if the user wants to add just those (ie keepAllPSWeights_==false)

Possibly @kdlong knows the cause of this difference, else I can look into this. Hopefully that answers your question?

agrohsje commented 3 years ago

This is confusing. Do you mind pointing me to a "normal" dataset.

dteague commented 3 years ago

For normal, I meant its the format I found in most of the files I tested. Definitely the second format with fsr:murfac=2.0 is the exception. Some files I tested on that have the isrDefHi are: /ZZTo4L_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v1/MINIAODSIM, /TWZToLL_tlept_Wlept_5f_DR_TuneCP5_PSweights_13TeV-amcatnlo-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v2/MINIAODSIM, and /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIISummer19UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM

Here has info on these sets as well as a link to the DAS page for convience!

agrohsje commented 3 years ago

Are you sure the PS weights in /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIISummer19UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM and /TTJets_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIISummer19UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM are different ? Both are using P 8.240 and the pre-defined PS weights. This would be very confusing and crucial to understand. Can you double-check?

Note that the first two above: /ZZTo4L_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v1/MINIAODSIM, /TWZToLL_tlept_Wlept_5f_DR_TuneCP5_PSweights_13TeV-amcatnlo-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v2/MINIAODSIM are non-UL samples, so they will use P8.230 or smaller and will be different from P8.240. The PR is supposed to handle P8.240 request which is equivalent to in principle all 19UL requests including PS weights.

kdlong commented 3 years ago

Hi Alexander,

My expectation was that there should basically be 2 formats: old-style Pythia, and new-style Pythia. Of course there is a little flexibility for samples that don't have weights at all or a smaller set.

What I understood from discussions with Dylan is that there are a small number of very weird files. @dteague can you post the file/dataset where you saw the issues with the weight names? I thought it was /ZGTo2NuG_TuneCP5_13TeV-amcatnloFXFX-pythia8/RunIIAutumn18MiniAOD-102X_upgrade2018_realistic_v15-v1/MINIAODSIM but looking at a few files here I don't see a problem

dteague commented 3 years ago

@agrohsje I rechecked my samples and I noticed the TTToHadronic was actually a RunIIAutumn18MiniAOD file, not an UltraLegacy, so that's a problem on my part! I tested with 2 other UL files and they had the same format with the fsr:murfac

As for @kdlong 's comment, I think I mentioned the wrong data set there, it was actual the /TTWJetsToLNu_TuneCP5_PSweights_13TeV-amcatnloFXFX-madspin-pythia8/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM set that had the strange features about it. There are weight names missing from the header. You can see my code and the output in this gist. A portion of the PR was devoted to handling this issue

agrohsje commented 3 years ago

Right. There could be a total of 4 cases. Samples with P 8.230 and below, as well as samples with P 8.240 and above. Then we had in 2017 (and sometimes copied to other years) by hand filled ps weights, and samples including the official fragment. For the last 2 cases one could double check if there is a difference. Not sure. In 8.240 this shouldn't appear at all as we encourage people to use the pre-written fragment. So modulo some weird bugs/failures, we should be good if this PR can handle all 4 cases: 2017 samples with hand-written ps weights, 2018 samples with included ps weight fragment, and ul with ps weights in both formats. For UL and hand-written, ideally we have no official example and one would need to generate an example privately. Let me know in case something I wrote is unclear.

kdlong commented 3 years ago

It makes more sense that this sample is the issue, since it at least has the weights entered by hand. I don't immediately see the issue with the header: https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/TOP-RunIIFall17wmLHEGS-00075/0

Hopefully it's a one off? Or are all the hand-pasted PS weight samples the same?

kdlong commented 3 years ago

Ok, looks like the pattern is that all names with a space before them (either \ then space at the next line, or ", name" end up empty. Is this fragment special, or was this the same thing we copy pasted in a lot of places?

agrohsje commented 3 years ago

Ok. I did some more tests. Good news: The new P8.240 logic is robust against wrong spaces. The bad news (as Kenneth wrote): the old is not. As discussed in the HN thread, the old logic searches for the first space and uses everything up to this as the name, i.e. in case there are misleading spaces, the name will just be empty. That means in all samples not using UL or the predefined psweights block (this has no spaces and works well with P8.230), we might have trouble. This should affect a lot of the 2017 samples using ps weights. They all use P8.230 and many include hand-made psweights setting. What the nano-aod code should catch is 1.) all UL samples (I will attach 2 example fragments for testing) 2.) all non-UL samples with pre-defined settings. Any preference about what to do with empty names? The order of the weights and the logic should be stable, but it is for sure ugly what we are doing. Dropping is bad for analyzers, a wrong weight in case of some change as well.

agrohsje commented 3 years ago

Please run cmsdriver for UL including this fragment : https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/TOP-RunIIFall17wmLHEGS-00075/0 to test "hand-written" weights and this fragment to test pre-defined ones: https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/TOP-RunIISummer19UL17wmLHEGEN-00005/0

kdlong commented 3 years ago

Thanks @agrohsje. This is good to know. It means that some of our code in the weight refactor needs to be revisited. @dteague do you mind taking a look at that?

@agrohsje do this UL samples already have MiniAOD? It's easier for us to print out the header from those and to test the nano with nanogen if so.

agrohsje commented 3 years ago

For request with the PS inluded there are plenty of examples: /TTToHadronic_TuneCP5_13TeV-powheg-pythia8/RunIISummer19UL18MiniAOD-106X_upgrade2018_realistic_v11_L1v1-v2/MINIAODSIM The hand-made case not. In fact, there is no way to find. Is there a chance to test with nano-gen (backport to 10_6). My tests were all super fast if nano-gen works for UL samples.

mariadalfonso commented 3 years ago

@dteague @agrohsje @kdlong

any news on this topic ? we need to know if a fix is needed for the V8 production.

agrohsje commented 3 years ago

the ul change should happen before v8 production. maybe @dteague can quickly comment on his status.

dteague commented 3 years ago

Hi all, before today, I thought I should have my changes applied to the Andreas fix branch (PR 35 reference above). In that PR, the code works for all of the cases @agrohsje mentioned above. @kdlong told me this should be addressed to the master branch here, so I've cherry picked the changes and am debugging/retesting the code. I should have these changes pushed and a PR open by tonight

dteague commented 3 years ago

@agrohsje @mariadalfonso @kdlong I ran into a few errors after rerunning the tests, but now its working on all the samples, and now a PR is open

mariadalfonso commented 3 years ago

@dteague Thanks

@swertz TOP is the main user of all these PS weights do you want to do some validation ? If yes report here

mariadalfonso commented 3 years ago

issue solve in cms-sw#31982

cms-nanoAOD / cmssw

changes in weightNames for PS for UL samples #546