Closed matt-komm closed 1 year ago
Hi @matt-komm, I'm wondering if you were ever in contact with someone about this? I have centrally produced signal samples that have entirely empty LHEPdfWeight branches and I suspect this is the reason why. I actually found this issue through this [1] HN thread. I've asked there if anyone has been in contact with developers but wanted to ping here as well.
Thanks!
[1] - https://hypernews.cern.ch/HyperNews/CMS/get/generators/4710/1/3/1/1.html
Adding a few people to make sure they're following, since this seems to affect quite a few analyses and I haven't seen any action being taken.
Could you please suggest a way forward?
@agrohsje @qliphy @peruzzim @fgolf
BTW, I tried to fix this myself but it turns out that the LHE header generated by MG is not properly formatted. Hence automatic parsing is not really possible. I am wondering if this is a known issue. In the meantime I am manually saving the weights I am interested in for analysis ...
Thanks for sharing. Would you mind sending how you did this? I'm not seeing any progress from experts (please feel free to chime in if there's a private conversation happening :-) ) so I'd like to look into this for myself.
@lcorcodilos I just wrote a simple plugin https://github.com/LLPDNNX/LLPReco/blob/master/NANOProducer/plugins/LHEWeightsProducer.cc which can then be instructed to store various weights if you know their ID, e.g. https://github.com/LLPDNNX/LLPReco/blob/master/NANOProducer/test/testGenWeight.py#L141-L178 This is more of a brute force, intermediate solution where one needs to manually inspect the LHE header to figure out which ID correspond to what variation for each sample ...
Hi guys, please have a look at this, as it may fix your problem with the different MG version: https://github.com/arizzi/cmssw/commit/34f6e8feb1c6c0bd52add4a02ae2b7f8a309d816
cheers, Andrea
On Thu, Jun 25, 2020 at 1:13 PM Matthias Komm notifications@github.com wrote:
@lcorcodilos https://github.com/lcorcodilos I just wrote a simple plugin https://github.com/LLPDNNX/LLPReco/blob/master/NANOProducer/plugins/LHEWeightsProducer.cc which can then be instructed to store various weights if you know their ID, e.g. https://github.com/LLPDNNX/LLPReco/blob/master/NANOProducer/test/testGenWeight.py#L141-L178 This is more of a brute force, intermediate solution where one needs to manually inspect the LHE header to figure out which ID correspond to what variation for each sample ...
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cms-nanoAOD/cmssw/issues/520#issuecomment-649474265, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGKFFSFLCC2HNGDVHWSDN3RYMWJ3ANCNFSM4NHUVXOA .
the reason for not using regular XML parser is because generators produce(d) invalid XML (e.g. opening a tag before the previous one was closed)
Dear all; May I kindly ping here to check whether any progress has been made fixing this? Thanks
Who is fixing this in the conversion from Mini to NANO-AOD? If properly configured, MG5.265 doesn't show any problems. We discussed the problem also w./ authors. For Powheg, xml check was already long time ago added to the scripts, should extend the MG5 tools in that direction.
It's not clear to me how to fix it centrally, if the generator content is in a broken format. I understand this is the case, it's not just adding another regexp (which is, by the way, a model we are trying to abandon). Last time I checked, lines were missing or placed in a wrong order, which makes it impossible to guess what the right thing should be without sample-specific information. The fact that the issue does not appear with the latest version of the generator does not fix the existing samples - what's the recommendation for using those?
The only solution I see is to preprocess the header for this specific issue (for the first weight, which is now written into it's own group, the closing tag is written in the wrong place) and force it back to valid (or mostly valid...) xml before parsing it. An implementation by @dteague for this is here: https://github.com/kdlong/cmssw/blob/ImprovedParsing/GeneratorInterface/Core/src/LHEWeightHelper.cc#L103-L126
Indeed I would strongly prefer to force the generator to produce valid xml in future versions.
Thanks Kenneth. @peruzzim: Can this be adapted for NANO-AOD for the time being? Redoing samples isn't justified. I started modifying our scripts so that we catch possible xml bugs in new mg5 versions: https://github.com/cms-sw/genproductions/pull/2673
From NanoAODv9
I see the branches the branches LHEScaleWeight
and LHEPdfWeight
with what it seems to be reasonable content. Can someone confirm If this issue was fixed?
we'll have a related discussion next week : https://indico.cern.ch/event/1167738/
please follow up in https://github.com/cms-sw/cmssw/pull/32167
Hi,
I am trying to produce nanoaod from a private 2016 miniaodv3 signal sample produced with MadGraph2.6.5. The https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc does not seem to pick up the scale and reweighting weights correctly. In particular the following regex' fail:
std::regex weightgroupRwgt("<weightgroup\\s+(?:name|type)=\"(.*)\"\\s*>");
matches<weightgroup name="mg_reweighting" weight_name_strategy="includeIdInWeightName">
but the outer most quotes are being captured resulting inmg_reweighting" weight_name_strategy="includeIdInWeightName
being interpreted asgroupname
. A temporary fix I found is to use<weightgroup\\s+(?:name|type)=\"(\\S+)\".*>
instead.std::regex scalewmg26x("<weight\\s+(?:.*\\s+)?id=\"(\\d+)\"\\s*(?:lhapdf=\\d+|dyn=\\s*-?\\d+)?\\s*((?:[mM][uU][rR]|renscfact)=\"(\\S+)\"\\s+(?:[mM][uU][Ff]|facscfact)=\"(\\S+)\")(\\s+.*)?</weight>");
does not match<weight MUF="0.5" MUR="0.5" PDF="325300" id="2"> MUR=0.5 MUF=0.5 </weight>
since at least theid
andMUF
,MUR
attributes are in a different order.I am curious why the xml is parsed here through a set of regex'. This seems to lead to some inflexibility if e.g. attributes are in a different order. Is there a reason not to use a common xml library, e.g. expat or xerces? I could imagine that the LHE header does not follow xml syntax precisely.
Here is one example file: root://gfe02.grid.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/cms/store/user/mkomm/HNL/miniaod16v3_200517/HNL_dirac_all_ctau1p0e00_massHNL6p0_Vall6p496e-03/miniaod16v3_200517/200517_004822/0000/HNL2016_140.root
Cheers