cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

Reading old files containing `std::auto_ptr<gen::PdfInfo>` #43422

Open makortel opened 1 year ago

makortel commented 1 year ago

PR https://github.com/cms-sw/cmssw/pull/43405 reminded us about the iorules in SimDataFormats/GeneratorProducts/src/classes_def.xml to read in std::auto_ptr<gen::PdfInfo> from files predating CMSSW_11_0_0. The std::auto_ptr was removed in C++17, although the libstdc++ still provides the implementation for backwards compatibility (link) while issuing a deprecation warning. We should nevertheless figure out a way to not have to use std::auto_ptr in the dictionaires (e.g. if we would have to use a standard library without std::auto_ptr some day).

For reference, here are pointers to various discussions from the time the iorules to convert the std::auto_ptr<gen::PdfInfo> to std::unique_ptr<gen::PdfInfo> were introduced

makortel commented 1 year ago

assign core

cmsbuild commented 1 year ago

New categories assigned: core

@Dr15Jones,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild commented 1 year ago

A new Issue was created by @makortel Matti Kortelainen.

@makortel, @Dr15Jones, @rappoccio, @antoniovilela, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 1 year ago

@cms-sw/generators-l2 @cms-sw/xpog-l2 Could you confirm if GenEventInfoProduct and/or LHEEventProduct data products from 10_6_X (Run 2 UL) AODSIM need to be readable in master for future MiniAOD/NanoAOD productions?

The PR test failures in https://github.com/cms-sw/cmssw/pull/43405#issuecomment-1828428581 occurred in

makortel commented 12 months ago

type root

vlimant commented 12 months ago

extremely likely, until we do a full Run2 reprocessing and MC (reconstruction from RAW); i.e not any time soon

makortel commented 12 months ago

Thanks @vlimant, this is pretty much what I would have guessed (but wanted to make sure).

makortel commented 12 months ago

@pcanal We would need an ability to read an old file, and evolve std::auto_ptr<T> (for one specific T=gen::PdfInfo) into something that we can further evolve into std::unique_ptr<T> (or directly to unique_ptr), in some way that avoids explicit use of std::auto_ptr<T> (since it has been removed from the standard, even if libstdc++ still seems to provide the implementation out of the box).

pcanal commented 12 months ago

For better (or worse, but better in this case), the read rule does not check the type listed in the rules against the incoming class, so all you need is:

namespace Compatibility {
template <typename T>
struct auto_ptr_is_deprecated
{
   ~auto_ptr_is_deprecated() { delete _M_ptr; }
   void release() { _M_ptr = nulltptr; }
   T * get() { return _M_ptr; }

   T *_M_ptr = nullptr;
};
} // Compatibility

and

 <ioread sourceClass = "GenEventInfoProduct" version="[11]" targetClass="GenEventInfoProduct" source = "Compatibility::auto_ptr<gen::PdfInfo> pdf_;" target="pdf_">
   <![CDATA[pdf_.reset(onfile.pdf_.get()); onfile.pdf_.release();]]>
 </ioread>

With this style, you need to make 100% sure that the layout of auto_ptr_is_deprecated is exactly the same as the layout implied by the TStreamerInfo for auto_ptr that is stored in the file being read.

Alternatively you can have a class name ::auto_ptr for which you generate a dictionary. In that case the layout does not have to be strictly the same.

makortel commented 12 months ago

Thanks @pcanal.

With this style, you need to make 100% sure that the layout of auto_ptr_is_deprecated is exactly the same as the layout implied by the TStreamerInfo for auto_ptr that is stored in the file being read.

What would happen if the layout would not be exactly the same? Would we get an error, or would data be misinterpreted silently?

pcanal commented 12 months ago

Unfortunately the data would be (possibly) interpreted incorrectly .. silently (i.e. The memory will be allocated and set as described by the TStreamerInfo and for all for intents and purposes reinterpret_cast ed to the stated type.

Dr15Jones commented 9 months ago

Unfortunately, using a different class name (i.e. hepmc_iorule::auto_ptr_is_deprecated) in the iorule source did not work. The JIT still happens and I get the following messages from ROOT

Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 09-Feb-2024 08:46:10.671 CST
input_line_57:5:26: warning: 'auto_ptr<gen::PdfInfo>' is deprecated: use 'std::unique_ptr' instead [-Wdeprecated-declarations]
         *ret = new std::auto_ptr<gen::PdfInfo>;
                         ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/backward/auto_ptr.h:287:7: note: 'auto_ptr<gen::
PdfInfo>' has been explicitly marked deprecated here
    } _GLIBCXX11_DEPRECATED_SUGGEST("std::unique_ptr");
      ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/x86_64-redhat-linux-gnu/bits/c++config.h:104:45:
 note: expanded from macro '_GLIBCXX11_DEPRECATED_SUGGEST'
# define _GLIBCXX11_DEPRECATED_SUGGEST(ALT) _GLIBCXX_DEPRECATED_SUGGEST(ALT)
                                            ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/x86_64-redhat-linux-gnu/bits/c++config.h:96:19: 
note: expanded from macro '_GLIBCXX_DEPRECATED_SUGGEST'
  __attribute__ ((__deprecated__ ("use '" ALT "' instead")))
                  ^

and the job terminates with the exception

----- Begin Fatal Exception 09-Feb-2024 08:43:29 CST-----------------------                                                                                                                                                                        
An exception of category 'FileReadError' occurred while                                                                                                                                                                                            
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0                                                                                                                                                                                         
   [1] Running path 'e'                                                                                                                                                                                                                            
   [2] Prefetching for module CheckGenEventInfoProduct/'check'                                                                                                                                                                                     
   [3] While reading from source GenEventInfoProduct dummy '' TEST                                                                                                                                                                                 
   [4] Reading branch GenEventInfoProduct_dummy__TEST.                                                                                                                                                                                             
   Additional Info:                                                                                                                                                                                                                                
      [a] Fatal Root Error: @SUB=TClass::BuildRealData                                                                                                                                                                                             
Inspection for auto_ptr<gen::PdfInfo> not supported!                                                                                                                                                                                               

----- End Fatal Exception -------------------------------------------------                                                                                                                                                                        

Running in gdb I find the exception comes from

#2  0x00007ffff6e0ab2b in ErrorHandler () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libCore.so                                                                                             
#3  0x00007ffff6d5a3e4 in TObject::Error(char const*, char const*, ...) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libCore.so                                                      
#4  0x00007ffff6e27b3f in TClass::BuildRealData(void*, bool) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libCore.so                                                                       
#5  0x00007ffff71d4068 in TStreamerInfo::BuildOld() () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so                                                                                 
#6  0x00007ffff6e19538 in TClass::GetStreamerInfoImpl(int, bool) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libCore.so                                                             
#7  0x00007ffff6e1981e in TClass::GetStreamerInfo(int, bool) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libCore.so                                                                 
#8  0x00007ffff711503d in TBufferFile::ReadVersion(unsigned int*, unsigned int*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so                                     
#9  0x00007ffff7116eff in TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so                                         
#10 0x00007ffff7355f48 in int TStreamerInfo::ReadBuffer<TVirtualArray>(TBuffer&, TVirtualArray const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()                                                                              
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so
#11 0x00007ffff73571df in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()                                                                                            
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so
#12 0x00007ffff71dd94d in TStreamerInfoActions::GenericReadAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so     
#13 0x00007ffff722ec5a in TStreamerInfoActions::UseCache(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so              
#14 0x00007ffff7116eae in TBufferFile::ReadClassBuffer(TClass const*, void*, TClass const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so                                         
#15 0x00007ffff735bdcc in int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) ()                                                                                            
   from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so
#16 0x00007ffff71dd94d in TStreamerInfoActions::GenericReadAction(TBuffer&, void*, TStreamerInfoActions::TConfiguration const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so     
#17 0x00007ffff710ebb5 in TBufferFile::ApplySequence(TStreamerInfoActions::TActionSequence const&, void*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libRIO.so                           
#18 0x00007ffff7872b87 in TBranchElement::ReadLeavesMember(TBuffer&) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libTree.so                                                               
#19 0x00007ffff786b429 in TBranch::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libTree.so                                                                        
#20 0x00007ffff787dd44 in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libTree.so                                                                 
#21 0x00007ffff787dcfd in TBranchElement::GetEntry(long long, int) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/external/el8_amd64_gcc12/lib/libTree.so                                                                 
#22 0x00007fffcccd685c in edm::RootTree::getEntry(TBranch*, long long) const () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/lib/el8_amd64_gcc12/pluginIOPoolInput.so                                                      
#23 0x00007fffcccb739c in edm::RootDelayedReader::getProduct_(edm::BranchID const&, edm::EDProductGetter const*) () from /cvmfs/cms.cern.ch/el8_amd64_gcc12/cms/cmssw/CMSSW_14_0_0_pre3/lib/el8_amd64_gcc12/pluginIOPoolInput.so                  
Dr15Jones commented 9 months ago

Using ::auto_ptr also does not work as at run time I get the error

Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 0 at 09-Feb-2024 09:05:49.444 CST                                                                                                                                         
input_line_57:5:21: error: reference to 'auto_ptr' is ambiguous                                                                                                                                                                                    
         *ret = new auto_ptr<gen::PdfInfo>;                                                                                                                                                                                                        
                    ^                                                                                                                                                                                                                              
SimDataFormatsGeneratorProducts_xr dictionary payload:102:8: note: candidate found by name lookup is 'auto_ptr'                                                                                                                                    
struct auto_ptr
       ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/unique_ptr.h:64:28: note: candidate found by name lookup is 'std::auto_ptr'
  template<typename> class auto_ptr;
                           ^
input_line_57:8:21: error: reference to 'auto_ptr' is ambiguous
         *ret = new auto_ptr<gen::PdfInfo>[nary];
                    ^
SimDataFormatsGeneratorProducts_xr dictionary payload:102:8: note: candidate found by name lookup is 'auto_ptr'
struct auto_ptr
       ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/unique_ptr.h:64:28: note: candidate found by name lookup is 'std::auto_ptr'
  template<typename> class auto_ptr;
                           ^
input_line_57:13:29: error: reference to 'auto_ptr' is ambiguous
         *ret = new (arena) auto_ptr<gen::PdfInfo>;
                            ^
SimDataFormatsGeneratorProducts_xr dictionary payload:102:8: note: candidate found by name lookup is 'auto_ptr'
struct auto_ptr
       ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/unique_ptr.h:64:28: note: candidate found by name lookup is 'std::auto_ptr'
  template<typename> class auto_ptr;
                           ^
input_line_57:16:29: error: reference to 'auto_ptr' is ambiguous
         *ret = new (arena) auto_ptr<gen::PdfInfo>[nary];
                            ^
SimDataFormatsGeneratorProducts_xr dictionary payload:102:8: note: candidate found by name lookup is 'auto_ptr'
struct auto_ptr
       ^
/data/cmsbld/jenkins/workspace/auto-builds/CMSSW_14_0_0_pre3-el8_amd64_gcc12/build/CMSSW_14_0_0_pre3-build/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/unique_ptr.h:64:28: note: candidate found b
y name lookup is 'std::auto_ptr'
  template<typename> class auto_ptr;

----- Begin Fatal Exception 09-Feb-2024 09:05:49 CST-----------------------
An exception of category 'FileReadError' occurred while
   [0] Processing  Event run: 1 lumi: 1 event: 1 stream: 0
   [1] Running path 'e'
   [2] Prefetching for module CheckGenEventInfoProduct/'check'
   [3] While reading from source GenEventInfoProduct dummy '' TEST
   [4] Reading branch GenEventInfoProduct_dummy__TEST.
   Additional Info:
      [a] Fatal Root Error: @SUB=TClingCallFunc::make_ctor_wrapper
Failed to compile
  ==== SOURCE BEGIN ====
__attribute__((used)) extern "C" void __ctor_2(void** ret, void* arena, unsigned long nary)
{
   if (!arena) {
      if (!nary) {
         *ret = new auto_ptr<gen::PdfInfo>;
      }
      else {
         *ret = new auto_ptr<gen::PdfInfo>[nary];
      }
   }
   else {
      if (!nary) {
         *ret = new (arena) auto_ptr<gen::PdfInfo>;
      }
      else {
         *ret = new (arena) auto_ptr<gen::PdfInfo>[nary];
      }
   }
}

  ==== SOURCE END ====

----- End Fatal Exception -------------------------------------------------

When I had ::auto_ptr<gen::PdfInfo> declared in the classes_def.xml file, scram always failed as genreflex kept issuing a warning that the type was unused. My attempts to force the type to exist all failed.

makortel commented 9 months ago

@cms-sw/xpog-l2 @cms-sw/generators-l2 Given the discovery that even the present IO rule to convert the auto_ptr to unique_ptr does not work properly for split input like AOD (#43923), I'm now wondering if the 10_6_X UL AODSIM has already been used as an input for MiniAOD production in release newer than 11_0_0?

vlimant commented 9 months ago

10.6 AOD/AODSIM has not been used as input to PAT/MINI => NANO workflow in recent releases indeed ; although this is the general plan we target (instead of just redoing NANO on top of 10.6 MINI)

vlimant commented 9 months ago

assign xpog

cmsbuild commented 9 months ago

New categories assigned: xpog

@vlimant,@hqucms you have been requested to review this Pull request/Issue and eventually sign? Thanks

vlimant commented 9 months ago

assign generators

cmsbuild commented 9 months ago

New categories assigned: generators

@alberto-sanchez,@bbilin,@GurpreetSinghChahal,@mkirsano,@menglu21,@SiewYan you have been requested to review this Pull request/Issue and eventually sign? Thanks

makortel commented 9 months ago

Thanks @vlimant. If the gen::PdfInfo is desired to be read from the 10_6_X AODSIM files, we (core) want to understand deeply where it is used, and what generators/samples include the object (or conversely, what would be impact if this object would just be ignored). In this case it would also be great to have a test to be run in IBs that fails when the gen::PdfInfo is expected to be present, but is not.

Then, we need ROOT team to fix https://github.com/cms-sw/cmssw/issues/43923, and give us further guidance on how to remove the explicit use of std::auto_ptr in the IO read rule.

menglu21 commented 6 months ago

Hi @makortel, all, sorry for the long silence. I just did a quick check on the nanoaod level, it's needed to extract those information https://github.com/cms-sw/cmssw/blob/e6c8423e67c416edd3d8cadd042f75003ba5bc55/PhysicsTools/NanoAOD/python/globals_cff.py#L49-L57 for "gen::PdfInfo". For those analysis using nanoaod, basically only one set of pdf and its replicas are stored. GEN and PDF group are planning to promote the usage of multiple PDFs in the analysis, if we don't want to increase the number of NanoAOD (which is kinda contradictory to the motivation of nanoaod), we need those information, i.e., "x1, x2, xpdf1.." to perform the post-production reweight. I think miniAOD should be the similar case. Hope I understand the point "need to be readable in master for future MiniAOD" correctly and did the needed check

makortel commented 6 months ago

To add, the plan shown in PPD general meetings https://indico.cern.ch/event/1415433/#1-news also explicitly mention Re-Mini + Re-Nano of Run 2 UL (i.e. AOD(SIM) files produced with 10_6_X) with 15_0_X next year. Given the need described in https://github.com/cms-sw/cmssw/issues/43422#issuecomment-2110399133 it seems to me https://github.com/cms-sw/cmssw/issues/43923 is becoming critical.

We would also really need a test, that would be run in IBs, that would fail if any number extracted from gen::PdfInfo is incorrect.

vlimant commented 5 months ago

some observations (that puzzle me right now to a large extent). I picked up /eos/cms/store/mc/RunIISummer20UL18RECO/TTToHadronic_TuneCP5_RTT_13TeV-powheg-pythia8/AODSIM/106X_upgrade2018_realistic_v11_L1v1-v2/60009/F4A0C48A-4DC4-7D4C-BD6C-94F9492FAE97.root

ran PAT on top, dropping all products but generator ones

cmsDriver.py step4 -s PAT --era Run2_2018 -n 100 --conditions auto:phase1_2018_realistic --mc --datatier MINIAODSIM --eventcontent MINIAODSIM --filein file:F4A0C48A-4DC4-7D4C-BD6C-94F9492FAE97.root --no_exec --fileout file:step4.root

with process.MINIAODSIMoutput.outputCommands = ['drop *_*_*_*', 'keep *_generator_*_*'] added

then ran NANO on top, dropping all tables but genTable (see above)

cmsDriver.py step5 -s NANO -n 10 --mc --eventcontent NANOAODSIM --datatier NANOAODSIM --conditions auto:phase1_2018_realistic --era Run2_2018 --filein file:step4.root --nThreads 2 --no_exec --fileout file:step5.root

with process.NANOAODSIMoutput.outputCommands = [ 'drop *_*_*_*' ,'keep *_genTable_*_*'] added

and got a file out and can read non -1 values for x1

root [1] Events->Scan("Generator_x1:Generator_xpdf1")
************************************
*    Row   * Generator * Generator *
************************************
*        0 * 0.0593872 *         0 *
*        1 * 0.1169471 *         0 *
*        2 * 0.0290822 *         0 *

which tells me that somehow, the pdf info was properly carried forward properly.

What am I missing ?

menglu21 commented 5 months ago

Hi @vlimant I'm not sure why the xpdf is not added. But to me it seems to be fine due to the fact that we have:

which are all we need to get the PDFs and for pdf reweight afterwards adding other experts that may know more details @mkirsano @mseidel42 @smrenna

agrohsje commented 5 months ago

Dear @menglu21, unfortunately, you cannot recover the info from x1, x2 in Powheg as this goes back to the underlying Born. So if you compare the weights stored in a Powheg file tto the simple formula using the stored x1,x2, they will be different. We could do a quick study how different they are but I remember plots from Andy B. showing the discrepancy.

menglu21 commented 5 months ago

@agrohsje thanks for the information, that would be great a check can be performed, and could you point me to the Andy's materials. Another question is that, do you remember why the xpdf value is not stored even in GEN level

agrohsje commented 5 months ago

That was Andy Buckley from ATLAS. So really long time. But the code is still the same.

menglu21 commented 5 months ago

Hi @vlimant by checking two samples, SUS-RunIISummer20UL16wmLHEGEN-00059 for Madgraph and TOP-RunIISummer20UL17wmLHEGEN-00705 for Powheg, the reason why xpdf is not in the output file (wmLHEGEN/mini/nano) is due to the fact this information is not stored in the LHE file. LHE content of the first event using gridpack in SUS-RunIISummer20UL16wmLHEGEN-00059 (using default seed 234567):

<event>
 8      3 +3.7867173e-01 5.50970200e+02 7.54677100e-03 9.28946200e-02
        2 -1    0    0  504    0 -0.0000000000e+00 +0.0000000000e+00 +7.5332151269e+02 7.5332151269e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
       21 -1    0    0  501  502 +0.0000000000e+00 -0.0000000000e+00 -1.4606968055e+03 1.4606968055e+03 0.0000000000e+00 0.0000e+00 1.0000e+00
       23  2    1    2    0    0 +3.1114993456e+01 -5.5261737331e+01 +3.6986755056e+02 3.8657988582e+02 9.2843959065e+01 0.0000e+00 0.0000e+00
      -15  1    3    3    0    0 +6.2995876021e+01 -4.5875475268e+01 +2.3171305183e+02 2.4447318001e+02 1.7770000000e+00 0.0000e+00 -1.0000e+00
       15  1    3    3    0    0 -3.1880882565e+01 -9.3862620638e+00 +1.3815449873e+02 1.4210670582e+02 1.7770000000e+00 0.0000e+00 1.0000e+00
       21  1    1    2  503  502 +1.5138909282e+02 +3.2381819151e+02 -9.7138273761e+02 1.0350658440e+03 0.0000000000e+00 0.0000e+00 1.0000e+00
       21  1    1    2  504  503 +3.6679892480e+01 +2.2418995629e+01 -3.5077928826e+02 3.5340364309e+02 0.0000000000e+00 0.0000e+00 1.0000e+00
        2  1    1    2  501    0 -2.1918397876e+02 -2.9097544980e+02 +2.4491918249e+02 4.3896894524e+02 0.0000000000e+00 0.0000e+00 -1.0000e+00
<scales pt_clust_4="13000.00000" pt_clust_5="13000.00000" pt_clust_6="423.95176" pt_clust_7="54.26423" pt_clust_8="13000.00000"></scales>
<mgrwt>
<rscale>  1 0.55097019E+03</rscale>
<asrwt>  2 0.54264231E+02 0.42395176E+03</asrwt>
<pdfrwt beam="1">  1       21 0.22472259E+00 0.55097019E+03</pdfrwt>
<pdfrwt beam="2">  1        2 0.11589562E+00 0.55097019E+03</pdfrwt>
<totfact> 0.34328695E-02</totfact>
</mgrwt>
... other reweight info 

you can find that "x1, x2, id1, id2 and scalePDF" are in the LHE record, and "xpdf" is not (may be possible to get it using some formula)

LHE content of the first event using gridpack in TOP-RunIISummer20UL17wmLHEGEN-00705 (using default seed 234567):

<event>
     13 300022  3.16466E+02  5.07622E+01 -1.00000E+00  1.36704E-01
      21    -1     0     0   504   505  0.000000000E+00  0.000000000E+00  7.279861986E+02  7.279861986E+02  0.000000000E+00  0.00000E+00  9.000E+00
      21    -1     0     0   502   503  0.000000000E+00  0.000000000E+00 -3.986538387E+02  3.986538387E+02  0.000000000E+00  0.00000E+00  9.000E+00
       6     2     1     2   502     0 -1.212793464E+01  1.203574101E+02  1.568244115E+02  2.625367547E+02  1.723329874E+02  0.00000E+00  9.000E+00
      -6     2     1     2     0   505 -3.039111003E+01 -9.262779379E+01  4.975211319E+02  5.351498357E+02  1.713316732E+02  0.00000E+00  9.000E+00
      21     1     1     2   504   503  4.251904467E+01 -2.772961633E+01 -3.250131834E+02  3.289534468E+02  3.814697266E-06  0.00000E+00  9.000E+00
      24     2     3     3     0     0  5.394904036E+01  9.524860833E+01  1.003580973E+02  1.719381720E+02  8.664981793E+01  0.00000E+00  9.000E+00
       5     1     3     3   502     0 -6.607697500E+01  2.510880179E+01  5.646631422E+01  9.059858274E+01  4.800000000E+00  0.00000E+00 -1.000E+00
     -24     2     4     4     0     0  2.723245182E+01  2.376296584E+00  2.045251353E+02  2.217556015E+02  8.122661251E+01  0.00000E+00  9.000E+00
      -5     1     4     4     0   505 -5.762356185E+01 -9.500409037E+01  2.929959965E+02  3.133942342E+02  4.800000000E+00  0.00000E+00  1.000E+00
      -1     1     6     6     0   506  1.121234308E+01 -8.626746795E+00 -7.673301907E+00  1.609431452E+01  1.000000000E-01  0.00000E+00  1.000E+00
       2     1     6     6   506     0  4.273669728E+01  1.038753551E+02  1.080313992E+02  1.558438575E+02  1.000000000E-01  0.00000E+00 -1.000E+00
       1     1     8     8   507     0  3.103236396E+01  2.913296668E+00  2.115051869E+02  2.137894969E+02  1.000000000E-01  0.00000E+00 -1.000E+00
      -2     1     8     8     0   507 -3.799912141E+00 -5.370000838E-01 -6.980051575E+00  7.966104528E+00  1.000000000E-01  0.00000E+00  1.000E+00
#rwgt            1           1   605.52990220408299           234567          27           0
<rwgt>
... other pdf reweight info

the "scalePDF, id1, id2" are explicitly stored, the "x1, x2" can be obtained by pz of two partons, all those information is consistent with the GEN event in TOP-RunIISummer20UL17wmLHEGEN-00705.root.

so my conclusion is that current GEN information stored in miniaod/nanoaod is consistent with expected. We can move to the test of using std::unique_ptr<gen::PdfInfo>

vlimant commented 5 months ago

there is no test or anything specific to do, since reading the plain 10.6 AODSIM seems to be ok for that product. @makortel any clue what is going on ?

wddgit commented 4 months ago

Matti is on vacation. Unless something changes since I last heard, he'll be back next week on July 8. Chris is also on vacation this week. Probably the best thing is pause this conversation until next week.