Open makortel opened 1 year ago
assign core
New categories assigned: core
@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
edm::OwnVector<SiPixelRecHit>
SiPixelRecHit
DataFormats/TrackerRecHit2D/src/classes_def.xml
edm::OwnVector<SiStripRecHit1D>
SiStripRecHit1D
SiStripRecHit1DCollectionOld
and visible in DataFormats/TrackerRecHit2D/src/classes_def.xml
edm::OwnVector<SiStripRecHit2D>
SiStripRecHit2D
SiStripRecHit2DCollectionOld
and visible in DataFormats/TrackerRecHit2D/src/classes_def.xml
edm::OwnVector<SiTrackerMultiRecHit>
SiTrackerMultiRecHit
DataFormats/TrackerRecHit2D/src/classes_def.xml
edm::OwnVector<MTDTrackingRecHit>
MTDTrackingRecHit
MTDTrackingOwnVector
and visible in DataFormats/TrackerRecHit2D/src/classes_def.xml
edm::OwnVector<FastTrackerCluster>
FastTrackerCluster
edm::RangeMap
that is aliased to FastTrackerClusterCollection
FastSimDataFormats/External/src/classes_def.xml
edm::OwnVector<CSCStripHit>
CSCStripHit
edm::RangeMap
of it aliased as CSCStripHitCollection
edm::OwnVector<SiStripMatchedRecHit2D>
SiStripMatchedRecHit2D
, could become std::vector<SiStripMatchedRecHit2D>
std::vector<std::unique_ptr<T>>
edm::OwnVector<BaseTrackerRecHit>
BaseTrackerRecHit
itself is abstract, likely need the dynamic polymorphismRecoTracker/TkSeedGenerator/plugins/MultiHitFromChi2EDProducer.cc
, as storage for pointers used elsewherestd::vector<std::unique_ptr<BaseTrackerRecHit>>
to edm::OwnVector<BaseTrackerRecHit>
, so should be simple to switch now
std::unique_ptr
yetstd::vector<std::unique_ptr<BaseTrackerRecHit>>
edm::OwnVector<TrackingRegion>
RecoTracker/TkTrackingRegions/interface/TrackingRegionEDProducerT.h
has a comment that at the time the friendly class name didn't support std::unique_ptr
yetstd::vector<std::unique_ptr<TrackingRegion>>
edm::OwnVector<CSCRecHit2D>
CSCRecHit2D
edm::RangeMap
of it aliased as CSCRecHit2DCollection
std::vector<CSCRecHit2D>
edm::OwnVector<CSCSegment>
CSCSegment
edm::RangeMap
of it aliased as CSCSegmentCollection
std::vector<CSCSegment>
edm::OwnVector<DTSLRecCluster>
DTSLRecCluster
edm::RangeMap
of it aliased as DTRecClusterCollection
std::vector<DTSLRecCluster>
edm::OwnVector<DTRecHit1DPair>
DTRecHit1DPair
edm::RangeMap
of it aliased as DTRecHitCollection
std::vector<DTRecHit1DPair>
edm::OwnVector<DTSLRecSegment2D>
DTSLRecSegment2D
edm::RangeMap
of it aliased as DTRecSegment2DCollection
std::vector<DTSLRecSegment2D>
edm::OwnVector<DTRecSegment4D>
DTRecSegment4D
edm::RangeMap
of it aliased as DTRecSegment4DCollection
std::vector<DTRecSegment4D>
edm::OwnVector<GEMCSCSegment>
GEMCSCSegment
edm::RangeMap
of it aliased as GEMCSCSegmentCollection
std::vector<GEMCSCSegment>
edm::OwnVector<GEMRecHit>
GEMRecHit
edm::RangeMap
of it aliased as GEMRecHitCollection
std::vector<GEMRecHit>
edm::OwnVector<GEMSegment>
GEMSegment
edm::RangeMap
of it aliased as GEMSegmentCollection
std::vector<GEMSegment>
edm::OwnVector<ME0RecHit>
ME0RecHit
edm::RangeMap
of it aliased as ME0RecHitCollection
std::vector<ME0RecHit>
edm::OwnVector<ME0Segment>
ME0Segment
edm::RangeMap
of it aliased as ME0SegmentCollection
std::vector<ME0Segment>
edm::OwnVector<RPCRecHit>
RPCRecHit
edm::RangeMap
of it aliased as RPCRecHitCollection
std::vector<RPCRecHit>
The DTRecSegment4D
, RPCRecHit
, GEMRecHit
, and GEMSegment
are coupled via MuRecObjBaseProducer (DPGAnalysis/MuonTools/interface/MuLocalRecoBaseProducer.h). Either these four cases of edm::OwnVector
need to be migrated together, or the code of MuRecObjBaseProducer
is replicated for the std::vector
case for the duration of the migration of those classes.
The CSCRecHit2D
, DTRecHit1DPair
, and RPCRecHit
are coupled via MuonDetCleaner
(TauAnalysis/MCEmbeddingTools/plugins/MuonDetCleaner.h
). Either these three cases of edm::OwnVector
need to be migrated together, or code of MuonDetCleaner
is replicated for the std::vector
case for the duration of the migration of those classes.
edm::OwnVector<TrackingRecHit>
TrackingRecHit
is a base class for a large class hierarchy. TrackingRecHit
itself is abstract, and likely needs the dynamic polymorphismedm::OwnVector<FastTrackerRecHit>
FastTrackerRecHit
is a base class for 3 inheriting classesFastTrackerRecHitCollection
FastSimulation/TrackingRecHitProducer/plugins/FastTrackerRecHitMatcher.cc
and FastSimulation/TrackingRecHitProducer/plugins/TrackingRecHitProducer.cc
FastSimulation/Tracking/plugins/FastTrackerRecHitMaskProducer.cc
and RecoTracker/TkSeedingLayers/src/SeedingLayerSetsBuilder.cc
edm::OwnVector<reco::BaseTagInfo>
reco::BaseTagInfo
is a base class for a hierarchy of 15+ inheriting classes, likely needs the dynamic polymorphismedm::OwnVector<reco::PFBlockElement>
reco::PFBlockElement
is a base class for 5 inheriting classes, presumably needs the dynamic polymorphismreco::PFBlock
std::vector<reco::PFBlock>
RecoParticleFlow/PFProducer/plugins/PFBlockProducer.cc
and RecoParticleFlow/PFSimProducer/plugins/SimPFProducer.cc
RecoParticleFlow/PFProducer/plugins/MLPFProducer.cc
, RecoParticleFlow/PFProducer/plugins/PFEGammaProducer.cc
, RecoParticleFlow/PFProducer/plugins/PFProducer.cc
, Validation/RecoParticleFlow/plugins/PFAnalysisNtuplizer.cc
MLPFProducer
suggests capability to persist would be usefuledm::OwnVector<pat::UserData>
pat::UserDataCollection
pat::UserHolder<T>
in DataFormats/PatCandidates/interface/UserData.h
pat::UserData
points to some concrete pat::UserHolder<T>
)pat::PatObject<ObjectType>
PhysicsTools/SelectorUtils/interface/VersionedIdProducer.h
edm::ValueMap<vid::CutFlowResult>>
, and copies the vid::CutFlowResult
to pat::UserData
pat::UserDataCollection
is neededpat::UserHolder<T>
can be found from what dictionaires are declared in DataFormats/PatCandidates/src/classes_def_user.xml
and DataFormats/PatCandidates/src/classes_def_other.xml
edm::OwnVector<reco::Candidate>
reco::Candidate
is a base class of large class hierarchyCandidateCollection
, also dictionary is declarededm::AssocianMap
, edm::AssociationVector
, edm::Association
template instantiations for dictionary declarationsWe could develop an std::variant
-based "OwnVector
" that would provide a similar interface as edm::OwnVector
(i.e. polymorphic access to the base class pointer/reference), but where the list of all possible concrete classes is defined explicitly. Need to test the schema evolution works if adding new possible concrete types to the variant
. But would this be convenient for large class hierarchies, like TrackingRecHit
and reco::Candidate
, or should we think of different, more invasive solutions?
If we proceed with the std::variant
-based OwnVector
, we'd also need a support for std::variant
in TTree
before deployment.
Some (many?) of the classes described in 2 and 3 were used in Fireworks. I'm not sure Fireworks needs any specific treatment beyond the migration described in 2 and 3. Backwards compatibility likely breaks in both cases.
assign reconstruction,fastsim,xpog,visualization
FYI @cms-sw/trk-dpg-l2 @cms-sw/tracking-pog-l2 @cms-sw/muon-dpg-l2 @cms-sw/muon-pog-l2 @cms-sw/btv-pog-l2 @cms-sw/pf-l2
New categories assigned: fastsim,xpog,reconstruction,visualization
@mdhildreth,@jfernan2,@sbein,@ssekmen,@Dr15Jones,@simonepigazzini,@makortel,@mandrenguyen,@alja,@vlimant,@civanch you have been requested to review this Pull request/Issue and eventually sign? Thanks
For the cases listed in https://github.com/cms-sw/cmssw/issues/42734#issuecomment-1708901672 (migrating bunch of cases from edm::OwnVector<T>
to std::vector<T>
) I'm thinking if 14_0_X would be a reasonable point for (likely) breaking the backwards compatibility?
We could develop an std::variant-based "OwnVector" that would provide a similar interface as edm::OwnVector (i.e. polymorphic access to the base class pointer/reference), but where the list of all possible concrete classes is defined explicitly.
A major downside to this design is it inverts the code coupling. Presently, packages using the base class only have to be dependent upon the package with the base class. Using a std::variant
would require dependent package to be dependent upon all packages which contain a class that inherits from the base class.
Additionally, adding a new type that inherits from the base class means updating all uses of std::variant
.
Just to remind that edm::OwnVector<TrackingRecHit>
is also persisted in MINIAOD (ie wherever a breaking change is introduced, old MINIAOD can't be used to produce NANOAOD in a newer release for example):
Singularity> edmDumpEventContent root://cms-xrd-global.cern.ch//store/mc/RunIISummer20UL16MiniAODv2/DYJetsToMuMu_H2ErratumFix_TuneCP5_13TeV-powhegMiNNLO-pythia8-photos/MINIAODSIM/106X_mcRun2_asymptotic_v17-v2/2820000/A140E69B-7E17-5942-8B41-0DCEB11597ED.root | grep slimmedMuonTrackExtras
edm::OwnVector<TrackingRecHit,edm::ClonePolicy<TrackingRecHit> > "slimmedMuonTrackExtras" "" "PAT"
edmNew::DetSetVector<SiPixelCluster> "slimmedMuonTrackExtras" "" "PAT"
edmNew::DetSetVector<SiStripCluster> "slimmedMuonTrackExtras" "" "PAT"
vector<reco::TrackExtra> "slimmedMuonTrackExtras" "" "PAT"
Singularity>
Just to remind that
edm::OwnVector<TrackingRecHit>
is also persisted in MINIAOD (ie wherever a breaking change is introduced, old MINIAOD can't be used to produce NANOAOD in a newer release for example):
Thanks Josh. Is the reading of a MINIAOD file produced on an earlier release cycle in a newer release cycle something that might become a requirement, or is it just hoped to work? (since strictly speaking backwards compatibility is guaranteed only for RAW)
it is something that we rely on quite heavily to reduce the number of times we have to start from AOD (or even earlier) in order to produce new NANO
it is something that we rely on quite heavily to reduce the number of times we have to start from AOD (or even earlier) in order to produce new NANO
And this is really across release cycles, rather than backporting the necessary changes to the release cycle where the MiniAOD was produced?
indeed. For two reason (as far as I can see):
1) it is much easier to add few modifier to a newer release to make sure we can read different MINIAOD as input to NANO rather than backporting features across several releases (think from 13_3 to 12_4, 10_6) 2) Often the changes are actually in the MINIAOD content itself, we can recompute them on the fly in newer releases while backporting would break rule of preserving the event content.
All this being said, if the change is necessary we can workout the right time to do it to minimize the need of accessing older MINIAODs. In general this will require re-making certain MINIAODs campaign starting from AOD in a release in which the OwnVector has been removed.
The main challenge I see here is with Run2 MINIAODs. It is by far easier to read them as they are to make new NANOAODs rather than either backport changes or reproduce all of them starting from AOD.
Just to remind that
edm::OwnVector<TrackingRecHit>
is also persisted in MINIAOD
The same comment holds true for almost all Tracker alignment and calibration ALCARECO flavours. In case there is a plan of something like a Run3 UltraLegacy re-reco you better plan it very carefully if people are supposed to re-use the Prompt ALCARECO for that purpose, otherwise a massive rereco just for calibration purposes might be necessary.
assign alca
New categories assigned: alca
@perrotta,@consuegs,@saumyaphor4252 you have been requested to review this Pull request/Issue and eventually sign? Thanks
Just to remind that
edm::OwnVector<TrackingRecHit>
is also persisted in MINIAODThe same comment holds true for almost all Tracker alignment and calibration ALCARECO flavours. In case there is a plan of something like a Run3 UltraLegacy re-reco you better plan it very carefully if people are supposed to re-use the Prompt ALCARECO for that purpose, otherwise a massive rereco just for calibration purposes might be necessary.
Thanks @mmusich for bringing this point up. Could you (or @cms-sw/alca-l2) elaborate why the same release cycle that was used to produce the ALCARECOs could not be used (or would be difficult to use) to derive these calibrations?
elaborate why the same release cycle that was used to produce the ALCARECOs could not be used (or would be difficult to use) to derive these calibrations?
A variety of reasons, but mostly these two:
The only drawback I see for std::vector<std::variant<A,B,C>>
is the explicit dependency from A,B,C
instead of the common base class.
if one knows that A,B,C
have a common base class (say Z
) runtime (dynamic) polymorphism is trivial (in absence of multiple inheritance). see example below.
At least for track/tracking the design is closed (the base class supports poor-man RTTI for fast down-casting) and most of the derived classes are in a single Data Format package
I hope root will support "schema evolution" in case one adds/deletes a type from a variant.
#include <iomanip>
#include <iostream>
#include <string>
#include <type_traits>
#include <variant>
#include <vector>
struct Z {
virtual ~Z(){}
virtual char operator()() const = 0;
};
struct A : public Z {
~A() override {}
char operator()() const override { return 'A';}
};
struct B : public Z {
~B() override {}
char operator()() const override { return 'B';}
};
struct C : public Z {
~C() override {}
char operator()() const override { return 'C';}
};
using ABC = std::variant<A,B,C>;
Z & toZ(ABC & v) { return *std::visit([](auto&& arg) -> Z* { return (Z*)(&arg);},v);}
using Cont = std::vector<ABC>;
int main() {
Cont c;
c.emplace_back(A());
c.emplace_back(B());
c.emplace_back(C());
for ( auto & v : c) std::cout << toZ(v)() << std::endl;
return 0;
}
Are the aforementioned expectations of MINIAOD and (some?) ALCARECOs to be readable by later release cycles being tested in the IBs?
At least for track/tracking the design is closed (the base class supports poor-man RTTI for fast down-casting) and most of the derived classes are in a single Data Format package
Are we 100 % sure the same classes (like edm::OwnVector<TrackingRecHit>
) is not used to store e.g. muon hits/segments or FastSim hits? Or did you refer specifically to the BaseTrackerRecHit
sub-hierarchy (which still includes the FastSim classes)?
Are the aforementioned expectations of MINIAOD and (some?) ALCARECOs to be readable by later release cycles being tested in the IBs?
Yes, e.g. https://github.com/cms-sw/cmssw/blob/a2fe5aab0bfb328573d0d080b45d1c3ad38f2f01/RecoTracker/TrackProducer/test/BuildFile.xml#L33 for MINIAOD. For ALCARECO there's plenty of tests that exercise this (e.g. in Alignment/OfflineValidation
).
Looking at 2023 prompt reco, the following collections seem to be used in MINIAOD
edm::OwnVector<TrackingRecHit,edm::ClonePolicy<TrackingRecHit> >
edm::OwnVector<reco::BaseTagInfo,edm::ClonePolicy<reco::BaseTagInfo> >
edm::RangeMap<CSCDetId,edm::OwnVector<CSCSegment,edm::ClonePolicy<CSCSegment> >,edm::ClonePolicy<CSCSegment> >
edm::RangeMap<DTChamberId,edm::OwnVector<DTRecSegment4D,edm::ClonePolicy<DTRecSegment4D> >,edm::ClonePolicy<DTRecSegment4D> >
and the following in AOD (in addition to the ones in MINIAOD)
edm::RangeMap<CSCDetId,edm::OwnVector<CSCRecHit2D,edm::ClonePolicy<CSCRecHit2D> >,edm::ClonePolicy<CSCRecHit2D> >
edm::RangeMap<GEMDetId,edm::OwnVector<GEMRecHit,edm::ClonePolicy<GEMRecHit> >,edm::ClonePolicy<GEMRecHit> >
edm::RangeMap<GEMDetId,edm::OwnVector<GEMSegment,edm::ClonePolicy<GEMSegment> >,edm::ClonePolicy<GEMSegment> >
edm::RangeMap<RPCDetId,edm::OwnVector<RPCRecHit,edm::ClonePolicy<RPCRecHit> >,edm::ClonePolicy<RPCRecHit> >
so already the migration of (half of) the classes in category 2 (https://github.com/cms-sw/cmssw/issues/42734#issuecomment-1708901672) would run into conflict with the aforementioned backwards compatibility expectations.
@cms-sw/xpog-l2 From the core
perspective we'd like to try out the migration of the edm::RangeMap<T, edm::OwnVector<T_Muon>>
to edm::RangeMap<T, std::vector<T_Muon>>
(i.e. the category 2), also as an exercise to see how complicated the migration process would be. The change would break (naive) backwards compatibility for both AOD and MINIAOD for these data products. One mitigation strategy could be to add EDProducers that convert the RangeMap<T, OwnVector<U>>
to RangeMap<T, vector<U>>
on the fly. Our first question is when such a change could be accommodated? Would it be useful to discuss in one of the future core software (or xpog) meetings?
Hi @makortel, an in person discussion would indeed be great. Our XPOG meeting on Wednesday 2pm is usually not too busy, we can accommodate a slot for you in an upcoming one (next one is Oct. 2nd).
an in person discussion would indeed be great. Our XPOG meeting on Wednesday 2pm is usually not too busy, we can accommodate a slot for you in an upcoming one (next one is Oct. 2nd).
Unfortunately next week (Oct 4) conflicts with O&C week. In the mean time @Dr15Jones prototyped the migration from the RangeMap<T, OwnVector<U>>
in https://github.com/cms-sw/cmssw/pull/42917 . When would the following meeting be? Maybe we can use Chris' PR as a base for the further discussion?
Hi @makortel, the next meeting is going to be on the 18th at 2pm GVA time. The point we would like to discuss are:
Based on these two points we can either discuss in person at the meeting on the 18th (in case substantial effort will be required by POGs) otherwise we can iterate here and in the relevant PRs.
I suspect that most of the edm::OwnVector<reco::Candidate>
are homogeneous collections and can be replaced by a vector?) and then say a virtual method that returns a vector<reco:::Candidate const*> and/or a virtual operator[] that returns the corresponding reco:::Candidate const & Of course edm should support the ability to consume a product declaring only a base class so that clients can get this new version of
edm::OwnVector
Of course edm should support the ability to consume a product declaring only a base class
This already exists via the use of edm::View<T>
were T
is the base class.
ah ok excellent. Thought it was necessary to introduce a "translator" producer.
So One can save a vector<GenParticle>
and a vector<PFCandidate>
and consume any of those (or both) in a jetAlgo that consumes edm::View<reco:::Candidate>
without introducing any further magic, isn't it?
How far back is there practical value in maintaining backward compatibility? Is 10_6_X the oldest release which might have been used to create input files for future processes creating NANO from MINIAOD or doing whatever it is that ALCA will be doing or other similar use cases?
Is there some point where this backwards compatibility is already broken?
I ask this understanding that there has always been a hard requirement to be able to read RAW from some very early point in CMS history.
I ask this understanding that there has always been a hard requirement to be able to read RAW from some very early point in CMS history.
Correct, we guarantee backwards compatibility for RAW (that also includes Scouting). And formally, that has been the only backwards compatibility guarantee, and for other data types the backwards compatibility has been on best effort basis (although in practice the schema evolution has worked quite widely).
How far back is there practical value in maintaining backward compatibility? Is 10_6_X the oldest release which might have been used to create input files for future processes creating NANO from MINIAOD or doing whatever it is that ALCA will be doing or other similar use cases?
The scope we have planned so far are creation of NanoAOD(SIM) / MiniAOD(SIM) from MiniAOD(SIM) / AOD(SIM) that were produced in the Run 2 Ultra Legacy processing (done in 10_6_X) in official processing campaigns, and reading of ALCARECOs produced in the same Run 2 UL processing in the scope that those are being read in CMSSW tests.
Is there some point where this backwards compatibility is already broken?
In principle no, because both the Mini/AOD case and ALCARECO case should be being tested in CMSSW. In practice, at least one subtle issue had already creeped in (https://github.com/cms-sw/cmssw/issues/43923), one can always worry about the coverage of the tests.
PR #43931 deals with the "Clearly unused" cases.
PR #43987 takes care of OwnVector<SiStripMatchedRecHit2D>
PR #44047 deals with OwnVector<BaseTrackerRecHit>
PR #44063 deals with OwnVector<TrackingRegion>
ROOT's new
RNTuple
columnar data storage is not going to support dynamic polymorphism (as opposed toTTree
). One such use case in our data formats isedm::OwnVector<T>
that effectively behaves asstd::vector<std::unique_ptr<T>>
, including the dynamic polymorphism (i.e. can store owning pointers to objects of any class that inherits fromT
). The purpose of this issue is to list all the current uses ofedm::OwnVector
, discuss how to address them, and track the progress.It should be noted than the deployment of the migration needs some thought, as the changes are very likely not backwards compatible (although, strictly speaking, the backwards compatibility of
edm::OwnVector
is not guaranteed across CMSSW release cycles).I expect the framework group to do majority of the work, but help from domain experts would also be very welcome.