cmsb2g / B2GAnaFW

Analysis framework for Beyond Two Generations (B2G) Physics Analysis Group (PAG) of the Compact Muon Solenoid (CMS) Experiment
8 stars 33 forks source link

Storing subjets #2

Closed ferencek closed 9 years ago

ferencek commented 9 years ago

Hi,

Is there a plan to store subjets of fat jets and corresponding "links" between the two jet collections?

Thanks, Dinko

osherson commented 9 years ago

Hi Dinko, there is a plan to do so, as well as the b-tagging for these subjets, but it is being implemented now.

Thanks,

-Marc On Dec 24, 2014 2:05 PM, "Dinko Ferencek" notifications@github.com wrote:

Hi,

Is there a plan to store subjets of fat jets and corresponding "links" between the two jet collections?

Thanks, Dinko

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2.

ferencek commented 9 years ago

Thanks for the prompt feedback. For subjet b tagging from MiniAOD, https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideCMSDataAnalysisSchool2015BTaggingExercise#Subjet_b_tagging_and_subjet_flav can be used as reference.

-Dinko

dmajumder commented 9 years ago

We now have pruned subjets with btag in the Ntuplizer.

ferencek commented 9 years ago

@dmajumder, this is great news. What is the procedure now to go from fat jets to subjets?

dmajumder commented 9 years ago

@ferencek we provide indices corresponding to the subjets for each pruned fat jet (same procedure as BtagAnalyzer). There is still a bit of development here, since we store only 2 subjets, but are currently modifying it to store up to 4. This can then be used with other jet groomers.

rappoccio commented 9 years ago

Hi, Guys, I'm trying to optimize this, and there are a lot of things in the workflow that are wrong, extraneous, duplicated, or not necessary. One egregious problem is that there's no reason to rerun jets that are already run in RECO and in the miniAOD (AK4 and AK8). However, it seems that the subjet b-tagging recipe here [1] needs to actually re-run jets to get the subjets, or at least to have groomed and ungroomed jets have the same size [2].

This seems unnecessary to me. Half (!!!) of the CPU time is devoted to remaking jets that already exist. We used to be able to just run b-tagging on the subjets without needing to know anything about anything else since we passed the "explicit" jet track association. What's the new deal? Why can't we have some options to do this the "old way" anymore?

[1] https://twiki.cern.ch/twiki/bin/viewauth/CMS/SWGuideCMSDataAnalysisSchool2015BTaggingExercise#Subjet_b_tagging_and_subjet_flav [2] Begin processing the 1st record. Run 1, Event 1, LumiSection 1 at 28-Jan-2015 15:59:20.745 CST %MSG-e TooFewReclusteredJets: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 There are fewer reclustered (7) than original jets (10). Please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetMatchingFailed: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 Matched reclustered jet 3 and original jet 1 are separated by dR=1.8396 which is greater than the jet size R=0.8. This is not expected so please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetMatchingFailed: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 Matched reclustered jet 4 and original jet 3 are separated by dR=1.36404 which is greater than the jet size R=0.8. This is not expected so please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetMatchingFailed: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 Matched reclustered jet 5 and original jet 6 are separated by dR=1.29835 which is greater than the jet size R=0.8. This is not expected so please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetMatchingFailed: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 Matched reclustered jet 5 and original jet 7 are separated by dR=1.26426 which is greater than the jet size R=0.8. This is not expected so please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetMatchingFailed: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 Matched reclustered jet 5 and original jet 9 are separated by dR=1.4033 which is greater than the jet size R=0.8. This is not expected so please check that the jet algorithm and jet size match those used for the original jet collection. %MSG %MSG-e JetPtMismatch: JetFlavourClustering:patJetFlavourAssociationAK8PFCHSPrunedSubjets 28-Jan-2015 15:59:20 CST Run: 1 Event: 1 The reclustered and original jet 0 have different Pt's (122.549 vs 121.66 GeV, respectively). Please check that the jet algorithm and jet size match those used for the original jet collection and also make sure the original jets are uncorrected. In addition, make sure you are not using CaloJets which are presently not supported.

In extremely rare instances the mismatch could be caused by a difference in the machine precision in which case make sure the original jet collection is produced and reclustering is performed in the same job. %MSG

ferencek commented 9 years ago

Hi, Sal,

In Run 1 the explicit jet track association was never used in b tagging, neither for subjets nor standard jets. For Run 2 the explicit JTA is now enabled for subjets (unfortunately, this makes b tagging dependent on the jet grooming algorithm which is not ideal but that's what it is). The extra use of resources comes from the subjet flavor which requires groomed and ungroomed fat jets as well as subjets to be present to work properly. Some wasting of resources also comes from the fact that jet constituents and generator-level hadrons and partons need to be reclustered to get the jet flavor and this becomes more severe in the case of subjets that require more than one jet collection to be present. However, the new jet flavor, even though it uses FastJet, is actually faster than the old flavor algorithm.

Since there are no subjets stored in MiniAOD, for now we will be forced to make them on the fly. So for substructrure-based analyses, a lot of stuff will simply need to be remade on the fly. I'm not sure what alternative is being suggested.

From the above printout it looks like something was not configured correctly.

rappoccio commented 9 years ago

Hi, Dinko,

OK, but can't we do all of that in the PRODUCTION of the miniaod's instead of re-running it in entirety? In the meantime can't we just switch off the subjet flavor until the next miniaod production? It's not so critical for PHYS14.

For the B2G ntuples, the scope is to give non-expert end users access to a reasonable default configuration to use the tools that are already in miniAOD. Most users do not need the subjets to perform their analysis and can get by with, say, the CMS top tagger or the "tau21 + groomed mass" W tagger. The material that is already in the miniAOD is already sufficient for their purposes. The only thing that is missing is between two and four numbers : the b-discriminators for the subjets. We can and should provide these as a "high level" access for end-user analyses.

Cheers, Sal

ferencek commented 9 years ago

Hi, Sal,

On 01/28/2015 06:32 PM, rappoccio wrote:

Hi, Dinko,

OK, but can't we do all of that in the PRODUCTION of the miniaod's instead of re-running it in entirety?

In principle, it should be possible and I guess that should probably be our longer term goal.

In the meantime can't we just switch off the subjet flavor until the next miniaod production? It's not so critical for PHYS14.

Yes, in the call to the addJetCollection() function for subjets, just add the following option

getJetMCFlavour = False

That should disable adding the flavor information to subjets.

Cheers, Dinko

For the B2G ntuples, the scope is to give non-expert end users access to a reasonable default configuration to use the tools that are already in miniAOD. Most users do not need the subjets to perform their analysis and can get by with, say, the CMS top tagger or the "tau21 + groomed mass" W tagger. The material that is already in the miniAOD is already sufficient for their purposes. The only thing that is missing is between two and four numbers : the b-discriminators for the subjets. We can and should provide these as a "high level" access for end-user analyses.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-71939597.

alschmid commented 9 years ago

Hi, after thinking a bit I agree. We had as a goal of PHYS14 to estimate the b-tag performance with the new algorithms and the new way of accessing through miniAOD etc... This would require truth information for subjets. However, this is not the scope of the B2G ntuple. This can be done directly in miniAOD or the btag ntuple. So indeed, let's keep the B2G ntuple as slim as possible for the standard analysis end user (not for commissioning of physics objects). Thanks million times for cleaning it up.

cheers, alex

On 29 Jan 2015, at 00:53, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

On 01/28/2015 06:32 PM, rappoccio wrote:

Hi, Dinko,

OK, but can't we do all of that in the PRODUCTION of the miniaod's instead of re-running it in entirety?

In principle, it should be possible and I guess that should probably be our longer term goal.

In the meantime can't we just switch off the subjet flavor until the next miniaod production? It's not so critical for PHYS14.

Yes, in the call to the addJetCollection() function for subjets, just add the following option

getJetMCFlavour = False

That should disable adding the flavor information to subjets.

Cheers, Dinko

For the B2G ntuples, the scope is to give non-expert end users access to a reasonable default configuration to use the tools that are already in miniAOD. Most users do not need the subjets to perform their analysis and can get by with, say, the CMS top tagger or the "tau21 + groomed mass" W tagger. The material that is already in the miniAOD is already sufficient for their purposes. The only thing that is missing is between two and four numbers : the b-discriminators for the subjets. We can and should provide these as a "high level" access for end-user analyses.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-71939597.

— Reply to this email directly or view it on GitHub.

rappoccio commented 9 years ago

Hi, Dinko,

OK, I think getting this into MiniAOD is a huge priority. Can you take care of that?

For now I will switch off the MC flavor.

Thanks, Sal

ferencek commented 9 years ago

@alschmid, I must admit I don't completely understand what we are concerned about here. Running or not running the subjet flavor has nothing to do with the slimness of the final ntuples. Let me explain what I mean. MiniAOD is designed to be, well, "mini" which necessarily means that not everything one might think of can be crammed into it. However, precisely for this reason packed PF candidate and some minimal MC truth info are kept allowing people to remake jet, b tagging, etc. or even develop and test new algorithms. Now, analyses with very simple event selection and relatively standard physics objects could in principle be done straight from MiniAOD. However, the B2G case is a bit more complex and certain objects need to be created on the fly and then stored in the final flat ROOT trees. And this is precisely the point. B2G at the moment needs to remake too many objects on the fly for direct MiniAOD use to be an option. That's where B2G ntuples come into play and significantly simplify and speed up later steps in the analysis. And I guess the idea is that B2G ntuples do not get recreated every two days so I don't see why using some extra CPU cycles for making a more complete B2G ntuples is such a big problem. Just recall the B2G PAT-tuples workflow. There was a ton of stuff created on the fly and the final output was quite heavy.

ferencek commented 9 years ago

@rappoccio, wouldn't this imply that we also store subjets in MiniAOD?

dmajumder commented 9 years ago

@rappoccio Some of the jet matching errors were caused by different pT cuts on AK8 and the pruned AK8 jets. Those should be fixed now, plus a host of other stuff that accumulated over the development cycle. @ferencek As for the error message from the jet flavour tool, they are switched off for subjets, but is the pathological behaviour also expected from pruned jets? There are no dR mismatches, but just JetPtMismatch, so perhaps it is due to some JEC issue?

ferencek commented 9 years ago

@dmajumder, the jet flavor should be disabled for any type of groomed fat jets. Here is a quote from https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideBTagMCTools

The hadron-based jet flavour should not be run on groomed jets or any modification of the original jets where some of the jet constituents have been removed. This is because reclustering constituents of such jets will generally result in a different jet configuration (only original jets with all of their constituents included are stable under reclustering).

The idea is that the fat jet flavor can be taken from the ungroomed fat jet.

There is another subtlety related to MiniAOD. Even though in principle possible, one should not re-run the jet flavor on slimmed jets stored in MiniAOD. These are PAT jet which already have the flavor information stored and reclustering will probably complain about the pT mismatches for at least two reasons: 1) the JECs are already applied so the reclustered and stored pT will be different and 2) the reclustering will be done using packed constituents which have reduced precision which can again lead to slight pT differences between RECO and MiniAOD.

dmajumder commented 9 years ago

@ferencek Thanks Dinko, indeed we do not store pruned jets, so this is quite unnecessary. I have switched it off.

ferencek commented 9 years ago

BTW, to proceed as efficiently as possible, all open issue should probably be discussed in some sort of meeting (either one of the B2G meetings or a private meeting) where we all sit down and talk to each other. What do you guys think?

alschmid commented 9 years ago

@ferencek yes, private meeting would be good. I think we don't contradict each other. The B2G workflow should help making complex tagging objects which are not in miniAOD. It should be those objects which are used in the analyses. And many analyses need subjet b-tags (not necessary the truth flavour). I can't judge if the truth flavour is too expensive right now, but if it can be added without exploding the worklow it would be good.

dmajumder commented 9 years ago

@ferencek Good idea. However the presentation on 20 Jan was explicitly to ask for feedback on open issues after running the framework. I wouldn't mind another discussion like in Dec with the core developers. Better to aim it after B2G jamboree on 03 Feb, and it would be a good time to start migrating to 73X.

rappoccio commented 9 years ago

Just recall the B2G PAT-tuples workflow. There was a ton of stuff created on the fly and the final output was quite heavy.

Yes, that's exactly the point. ;) The entire idea was to ELIMINATE the B2G PAT-tuple workflow. We worked extensively with the miniAOD authors many months ago to address this. We do NOT want to have to organize a multi-month processing again, since it's a relatively big waste of time.

About storing subjets : For the majority of users, they don't need the actual subjets. They will need the b discriminator values, and maybe the four vectors. That's about it.

alschmid commented 9 years ago

@rappoccio yes the subjet 4vectors and the bdiscriminator values and the mc truth is what I meant...That's the analysis level information needed. And of course the calorimeter raw data

ferencek commented 9 years ago

Storing subjet information in MiniAOD definitely deserves a discussion. I'm not sure what would be the best way to store that info in MiniAOD.

rappoccio commented 9 years ago

We don't need the subjets in miniAOD per se. Ideally we will only need the subjet b-tagging information and perhaps the four-vectors, both of which can be stored as user data.

ferencek commented 9 years ago

@rappoccio, I see. Yes, if the subjet info can be embedded in the fat jets user data, that should be sufficient for most users.

rappoccio commented 9 years ago

Hi, Folks​,

OK, for the short term, I fixed a bunch of stuff. It's not quite done yet, but I have to wait for an answer from some FW experts.

https://hypernews.cern.ch/HyperNews/CMS/get/b2g-selections/211.html

The next stages are :

  1. I have to get the "thinned collections" deployed. I was hoping to have it in for 740 but the last open release is Tuesday so... doubt that will happen.
  2. Someone has to get the subjet b-tagging into the miniAOD in a nice format. Ideally for 74x's miniAOD production. (Dinko, can you handle this?)
  3. We also need someone to store the subjets of miniAOD as userData. This should be a few lines in the miniAOD python file. (Devdatta, can you handle this?)

Cheers, Sal

ferencek commented 9 years ago

Points 2. and 3. are tied closely together. I guess we want the SoftDrop subjets. Looking at what is stored in one of the recent RelVals /RelValTTbar_13/CMSSW_7_4_0_pre5-MCRUN2_73_V7-v1/GEN-SIM-RECO I see the following

Type                        Module                       Label       Process   
-------------------------------------------------------------------------------
vector<reco::PFJet>         "ak8PFJetsCHS"               ""          "RECO"
vector<reco::BasicJet>      "ak8PFJetsCHSSoftDrop"       ""          "RECO"    
vector<reco::PFJet>         "ak8PFJetsCHSSoftDrop"       "SubJets"   "RECO"

so it shouldn't be too complicated to get b tagging running over the SoftDrop subjets. I'm a bit less certain about how to pick up the subjet b-tag discriminators and store them in the slimmedJetsAK8 user data but maybe you or Devdatta already know the answer. In any case, I have a few PAT jetTools updates to implement these days and in parallel I will look to see what can be done about point 2.

rappoccio commented 9 years ago

Hi, Dinko,

Awesome, thanks! Yes, we want the soft-drop subjets, I will also switch to that in miniAOD in any case. To store the subjet discriminators, it should probably simply be some version of our "BoostedJetMerger" functionality, and then we can just store what we want as user data like we're currently doing for the masses.

Do you think you have time to do this by the 74x deadline?

Cheers, Sal

On Thu, Jan 29, 2015 at 10:50 PM, Dinko Ferencek notifications@github.com wrote:

Points 2. and 3. are tied closely together. I guess we want the SoftDrop subjets. Looking at what is stored in one of the recent RelVals /RelValTTbar_13/CMSSW_7_4_0_pre5-MCRUN2_73_V7-v1/GEN-SIM-RECO I see the following

Type Module Label Process

vectorreco::PFJet "ak8PFJetsCHS" "" "RECO" vectorreco::BasicJet "ak8PFJetsCHSSoftDrop" "" "RECO" vectorreco::PFJet "ak8PFJetsCHSSoftDrop" "SubJets" "RECO"

so it shouldn't be too complicated to get b tagging running over the SoftDrop subjets. I'm a bit less certain about how to pick up the subjet b-tag discriminators and store them in the slimmedJetsAK8 user data but maybe you or Devdatta already know the answer. In any case, I have a few PAT jetTools updates to implement these days and in parallel I will look to see what can be done about point 2.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72150147.

ferencek commented 9 years ago

Hi, Sal,

I'll do my best to have this done before the 740 deadline.

-Dinko

On 01/29/2015 10:58 PM, rappoccio wrote:

Hi, Dinko,

Awesome, thanks! Yes, we want the soft-drop subjets, I will also switch to that in miniAOD in any case. To store the subjet discriminators, it should probably simply be some version of our "BoostedJetMerger" functionality, and then we can just store what we want as user data like we're currently doing for the masses.

Do you think you have time to do this by the 74x deadline?

Cheers, Sal

On Thu, Jan 29, 2015 at 10:50 PM, Dinko Ferencek notifications@github.com wrote:

Points 2. and 3. are tied closely together. I guess we want the SoftDrop subjets. Looking at what is stored in one of the recent RelVals /RelValTTbar_13/CMSSW_7_4_0_pre5-MCRUN2_73_V7-v1/GEN-SIM-RECO I see the following

Type Module Label Process


vectorreco::PFJet "ak8PFJetsCHS" "" "RECO" vectorreco::BasicJet "ak8PFJetsCHSSoftDrop" "" "RECO" vectorreco::PFJet "ak8PFJetsCHSSoftDrop" "SubJets" "RECO"

so it shouldn't be too complicated to get b tagging running over the SoftDrop subjets. I'm a bit less certain about how to pick up the subjet b-tag discriminators and store them in the slimmedJetsAK8 user data but maybe you or Devdatta already know the answer. In any case, I have a few PAT jetTools updates to implement these days and in parallel I will look to see what can be done about point 2.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72150147.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72150744.

rappoccio commented 9 years ago

Thanks Dinko!

ferencek commented 9 years ago

Hi, Sal, all

I had a look at getting subjet b tagging in MiniAOD and it looks like the simplest way to achieve this would be to add addJetCollection(...) calls for soft drop jets and subjets followed by the BoostedJetMerger in http://cmslxr.fnal.gov/source/PhysicsTools/PatAlgos/python/slimming/miniAOD_tools.py?v=CMSSW_7_4_0_pre6#0080

However, what is less clear to me is how to include the subjet info in the AK8 userData. It looks like we would need something along the lines of RecoJetDeltaRValueMapProducer (http://cmslxr.fnal.gov/source/CommonTools/RecoAlgos/plugins/JetDeltaRValueMapProducer.cc?v=CMSSW_7_4_0_pre6) which would match AK8 jets with packed soft drop jets and then grab the required subjet info using the daughter pointers. This is because dynamic casting to pat::Jet is required and I guess there is no way for the StringObjectFunction inside RecoJetDeltaRValueMapProducer to handle something like that.

BTW, what would be the easiest way to store the subjet 4-vectors? As individual components or as a full vector? Also, should we store raw or corrected subjet 4-vectors?

Cheers, Dinko

ferencek commented 9 years ago

Hi Guys,

Here is a first draft of storing subjet b-tag discriminators in MiniAOD https://github.com/cms-btv-pog/cmssw/compare/PATJetMiniAODImprovements_from-CMSSW_7_4_0_pre6...B2GMiniAODSubJets_PATJetMiniAODImprovements_from-CMSSW_7_4_0_pre6

There are probably better solutions but this one seems to work. Let me know what you think.

Cheers, Dinko

rappoccio commented 9 years ago

Hi, Dinko,

Thanks for following up so quickly!

Yes, that's what I had in mind. The dynamic cast can be handled using the StringObjectFunction, I think, but I could be wrong. It would be easiest to store the subjet 4-vectors directly as 4-vectors, most likely.

Let me know if this doesn't work and I can take a look tomorrow.

Cheers, Sal

On Sat, Jan 31, 2015 at 2:01 PM, Dinko Ferencek notifications@github.com wrote:

Hi, Sal, all

I had a look at getting subjet b tagging in MiniAOD and it looks like the simplest way to achieve this would be to add addJetCollection(...) calls for soft drop jets and subjets followed by the BoostedJetMerger in

http://cmslxr.fnal.gov/source/PhysicsTools/PatAlgos/python/slimming/miniAOD_tools.py?v=CMSSW_7_4_0_pre6#0080

However, what is less clear to me is how to include the subjet info in the AK8 userData. It looks like we would need something along the lines of RecoJetDeltaRValueMapProducer ( http://cmslxr.fnal.gov/source/CommonTools/RecoAlgos/plugins/JetDeltaRValueMapProducer.cc?v=CMSSW_7_4_0_pre6)

which would match AK8 jets with packed soft drop jets and then grab the required subjet info using the daughter pointers. This is because dynamic casting to pat::Jet is required and I guess there is no way for the StringObjectFunction inside RecoJetDeltaRValueMapProducer to handle something like that.

BTW, what would be the easiest way to store the subjet 4-vectors? As individual components or as a full vector? Also, should we store raw or corrected subjet 4-vectors?

Cheers,

Dinko

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72330930.

ferencek commented 9 years ago

Hi, Sal,

You were right about the StringObjectFunction, it can handle dynamic casting or does some other framework magic to evaluate pat::Jet methods from daughter pointers. However, for this to work the expression parser need to be set to the lazy mode. I extended the JetDeltaRValueMapProducer so that lazy parsing can be used and multiple values can be evaluated with a single jet matching. I pushed the updated code to the same branch linked above.

For the subjet 4-vectors I have not done anything yet but it looks like one way to store them would be to do something along the lines of https://github.com/cms-sw/cmssw/blob/CMSSW_7_3_0/PhysicsTools/PatAlgos/test/private/PATUserDataTestModule.cc#L189-L207 This implies writing yet another producer and as far as I can tell, subjet 4-vectors would have to be stored in a separate collection in MiniAOD and then pointed to from slimmedJetsAK8. But I could be wrong about this since I don't have much experience with pat::UserData.

Cheers, Dinko

rappoccio commented 9 years ago

On Sun, Feb 1, 2015 at 10:29 PM, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

You were right about the StringObjectFunction, it can handle dynamic casting or does some other framework magic to evaluate pat::Jet methods from daughter pointers. However, for this to work the expression parser need to be set to the lazy mode. I extended the JetDeltaRValueMapProducer so that lazy parsing can be used and multiple values can be evaluated with a single jet matching. I pushed the updated code to the same branch linked above.

Fantastic, thanks!

For the subjet 4-vectors I have not done anything yet but it looks like one way to store them would be to do something along the lines of https://github.com/cms-sw/cmssw/blob/CMSSW_7_3_0/PhysicsTools/PatAlgos/test/private/PATUserDataTestModule.cc#L189-L207 This implies writing yet another producer and as far as I can tell, subjet 4-vectors would have to be stored in a separate collection in MiniAOD and then pointed to from slimmedJetsAK8. But I could be wrong about this since I don't have much experience with pat::UserData.

There are dictionaries for 4-vectors so you can just do something like this :

https://github.com/cmsb2g/B2GAnaFW/blob/master/python/b2gedmntuples_cff.py#L478

 cms.PSet(
    tag = cms.untracked.string("subjetIndex0"),
    quantity = cms.untracked.string("? numberOfDaughters > 0 ?

daughterPtr(0).p4() ") ),

Cheers, Sal

ferencek commented 9 years ago

Hi, Sal,

I'm not sure how dictionaries for vectors of 4-vectors would help with embedding the subjet 4-vectors into PAT jets. The simplest (but maybe not the most elegant) solution would be to simply embed this info as 4 floats per subjet. Everything else will require additional developments so unless somebody can implement these developments by tomorrow (I can't promise because of some problems that have arisen with the planned 740 b-tagging updates), I would go with storing the 4-vector components as floats.

Cheers, Dinko

On 02/02/2015 10:10 AM, rappoccio wrote:

On Sun, Feb 1, 2015 at 10:29 PM, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

You were right about the StringObjectFunction, it can handle dynamic casting or does some other framework magic to evaluate pat::Jet methods from daughter pointers. However, for this to work the expression parser need to be set to the lazy mode. I extended the JetDeltaRValueMapProducer so that lazy parsing can be used and multiple values can be evaluated with a single jet matching. I pushed the updated code to the same branch linked above.

Fantastic, thanks!

For the subjet 4-vectors I have not done anything yet but it looks like one way to store them would be to do something along the lines of

https://github.com/cms-sw/cmssw/blob/CMSSW_7_3_0/PhysicsTools/PatAlgos/test/private/PATUserDataTestModule.cc#L189-L207 This implies writing yet another producer and as far as I can tell, subjet 4-vectors would have to be stored in a separate collection in MiniAOD and then pointed to from slimmedJetsAK8. But I could be wrong about this since I don't have much experience with pat::UserData.

There are dictionaries for 4-vectors so you can just do something like this :

https://github.com/cmsb2g/B2GAnaFW/blob/master/python/b2gedmntuples_cff.py#L478

cms.PSet( tag = cms.untracked.string("subjetIndex0"), quantity = cms.untracked.string("? numberOfDaughters > 0 ? daughterPtr(0).p4() ") ),

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72472593.

cmsb2g commented 9 years ago

Hi, Dinko,

I don't think there are any developments / dictionaries needed IIRC. I'll take a look today.

Cheers, Sal

ferencek commented 9 years ago

OK, great. Let us know what you find out.

-Dinko

On 02/03/2015 10:52 AM, cmsb2g wrote:

Hi, Dinko,

I don't think there are any developments / dictionaries needed IIRC. I'll take a look today.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72674625.

rappoccio commented 9 years ago

Hi, Dinko, All,

OK I had a look at this. The ultimate limitation is that the StringObjectFunction does not actually allow 4-vectors as outputs, although this is supported by the pat::Object. The only option is to therefore store the 4-vector components as floats. Grrr...

The good news is, I've sped up our workflow sufficiently now using the preclustering that we could probably just rerun in any case if we want to store the subjets. It should still be fast enough. At this point I think either is a viable option.

Cheers, Sal

ferencek commented 9 years ago

Hi, Sal,

Should I go ahead with adding subjet pT, eta, phi, and mass (I guess this is a bit more convenient than px, py, pz, and energy) and then if needed, people can recreate the subjet 4-vectors at the analysis level.

Cheers, Dinko

On 02/03/2015 03:26 PM, rappoccio wrote:

Hi, Dinko, All,

OK I had a look at this. The ultimate limitation is that the StringObjectFunction does not actually allow 4-vectors as outputs, although this is supported by the pat::Object. The only option is to therefore store the 4-vector components as floats. Grrr...

The good news is, I've sped up our workflow sufficiently now using the preclustering that we could probably just rerun in any case if we want to store the subjets. It should still be fast enough. At this point I think either is a viable option.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72727471.

rappoccio commented 9 years ago

Hey Dinko

If you get a chance, sure.

Cheers Sal

On Tuesday, February 3, 2015, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

Should I go ahead with adding subjet pT, eta, phi, and mass (I guess this is a bit more convenient than px, py, pz, and energy) and then if needed, people can recreate the subjet 4-vectors at the analysis level.

Cheers, Dinko

On 02/03/2015 03:26 PM, rappoccio wrote:

Hi, Dinko, All,

OK I had a look at this. The ultimate limitation is that the StringObjectFunction does not actually allow 4-vectors as outputs, although this is supported by the pat::Object. The only option is to therefore store the 4-vector components as floats. Grrr...

The good news is, I've sped up our workflow sufficiently now using the preclustering that we could probably just rerun in any case if we want to store the subjets. It should still be fast enough. At this point I think either is a viable option.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72727471.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72729313.

ferencek commented 9 years ago

OK, done https://github.com/cms-btv-pog/cmssw/compare/B2GMiniAODSubJets_PATJetMiniAODImprovements_from-CMSSW_7_4_0_pre6

I checked the impact on the MiniAOD event size by running over 500 events from /RelValProdTTbar_13/CMSSW_7_4_0_pre5-MCRUN2_73_V7-v1/AODSIM and the increase was ~0.25%. I guess we can afford this.

If there are no objections, I will make a PR using the above branch.

Best, Dinko

On 02/03/2015 04:21 PM, rappoccio wrote:

Hey Dinko

If you get a chance, sure.

Cheers Sal

On Tuesday, February 3, 2015, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

Should I go ahead with adding subjet pT, eta, phi, and mass (I guess this is a bit more convenient than px, py, pz, and energy) and then if needed, people can recreate the subjet 4-vectors at the analysis level.

Cheers, Dinko

On 02/03/2015 03:26 PM, rappoccio wrote:

Hi, Dinko, All,

OK I had a look at this. The ultimate limitation is that the StringObjectFunction does not actually allow 4-vectors as outputs, although this is supported by the pat::Object. The only option is to therefore store the 4-vector components as floats. Grrr...

The good news is, I've sped up our workflow sufficiently now using the preclustering that we could probably just rerun in any case if we want to store the subjets. It should still be fast enough. At this point I think either is a viable option.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72727471.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72729313.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72737271.

rappoccio commented 9 years ago

Hi, Dinko,

Wow, I like this implementation, nice work :).

One other thing, though : we should add the top-tag subjets from the CMS top tagger AND the W/Z/H-tag subjets from soft drop. Do you want to handle this or should I? I guess the names would be something like

etc.

What do you think?

Cheers, Sal

On Tue, Feb 3, 2015 at 4:25 PM, Dinko Ferencek notifications@github.com wrote:

OK, done

https://github.com/cms-btv-pog/cmssw/compare/B2GMiniAODSubJets_PATJetMiniAODImprovements_from-CMSSW_7_4_0_pre6

I checked the impact on the MiniAOD event size by running over 500 events from /RelValProdTTbar_13/CMSSW_7_4_0_pre5-MCRUN2_73_V7-v1/AODSIM and the increase was ~0.25%. I guess we can afford this.

If there are no objections, I will make a PR using the above branch.

Best, Dinko

On 02/03/2015 04:21 PM, rappoccio wrote:

Hey Dinko

If you get a chance, sure.

Cheers Sal

On Tuesday, February 3, 2015, Dinko Ferencek notifications@github.com wrote:

Hi, Sal,

Should I go ahead with adding subjet pT, eta, phi, and mass (I guess this is a bit more convenient than px, py, pz, and energy) and then if needed, people can recreate the subjet 4-vectors at the analysis level.

Cheers, Dinko

On 02/03/2015 03:26 PM, rappoccio wrote:

Hi, Dinko, All,

OK I had a look at this. The ultimate limitation is that the StringObjectFunction does not actually allow 4-vectors as outputs, although this is supported by the pat::Object. The only option is to therefore store the 4-vector components as floats. Grrr...

The good news is, I've sped up our workflow sufficiently now using the preclustering that we could probably just rerun in any case if we want to store the subjets. It should still be fast enough. At this point I think either is a viable option.

Cheers, Sal

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72727471.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72729313.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72737271.

— Reply to this email directly or view it on GitHub https://github.com/cmsb2g/B2GAnaFW/issues/2#issuecomment-72738135.

ferencek commented 9 years ago

PR created https://github.com/cms-sw/cmssw/pull/7549