Open makortel opened 4 years ago
assign core
New categories assigned: core
@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks
A new Issue was created by @makortel Matti Kortelainen.
@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
@pcanal suggested to look into TClass::GetMissingDictionaries()
or TDictionary::GetDictionary()
.
This issue is related to investigations done as part of https://github.com/cms-sw/cmssw/issues/27277.
Mostly to document for myself. I replaced https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Reflection/src/TypeWithDict.cc#L808-L815 with
// A related free function
bool hasDictionary(std::type_info const& ti) {
return TDictionary::GetDictionary(ti) != nullptr;
}
after which my test failed with
An exception of category 'DictionaryNotFound' occurred while
[0] Constructing the EventProcessor
[1] Calling OutputModuleBase::keepThisBranch, checking dictionaries for kept types
Exception Message:
No data dictionary found for the following classes:
std::array<int,4>
Most likely each dictionary was never generated, but it may
be that it was generated in the wrong package. Please add
(or move) the specification '<class name="whatever"/>' to
the appropriate classes_def.xml file along with any other
information needed there. For example, if this class has any
transient members, you need to specify them in classes_def.xml.
Also include the class header in classes.h
That exception is thrown from
https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Framework/src/one/OutputModuleBase.cc#L147-L150
where the checkDictionary()
overload is this one
https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Reflection/src/DictionaryTools.cc#L92-L100
The branch is fired because typeWithDict.invalidTypeInfo() == true
, which translates to
https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Reflection/src/TypeWithDict.cc#L355
The isPointer()
and isArray()
both return false
, so *ti_ == typeid(dummyType)
must hold.
The TypeWithDict::ti_
can be set to dummyType
only in this constructor (when TypeWithDict::class_ != nullptr
)
https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Reflection/src/TypeWithDict.cc#L318-L328
when TClass::GetTypeInfo() == nullptr
, which is the case with std::array<T, N>
(confirmed with a printout there, and in ROOT prompt).
I need to think a bit more how to proceed from here.
I just wanted to check on the status
he isPointer() and isArray() both return false,
Shouldn't isArray() return true here? (I.e. should it be expanded to also return true for std::array)?
he isPointer() and isArray() both return false,
Shouldn't isArray() return true here? (I.e. should it be expanded to also return true for std::array)?
Good question. I need to understand if we use isArray()
anywhere in a way that would need to differentiate between std::array<T ,N>
and T[N]
.
On the other hand isArray() == true
might not help much, TypeWithDict::invalidTypeInfo()
would still return true
, and
https://github.com/cms-sw/cmssw/blob/cea1d1bc4aa7b376b68eb70a34cfb4c7184f438b/FWCore/Reflection/src/DictionaryTools.cc#L92-L100
would add the type to missingDictionaries
. This makes me wonder if that check, or the call from OutputModuleBase::keepThisBranch()
are fully appropriate (for std::array
). Of course it could be that changing isArray() == true
would lead to another code branch to be taken elsewhere.
Of course it could be that changing isArray() == true would lead to another code branch to be taken elsewhere.
It "should" since a C-style would not have a TClass and would not have a typeinfo stored by ROOT.
TypeWithDict::invalidTypeInfo() would still return true
Indeed, since the dictionary for the std::array is not generate we do not acquire the type_info for it (we 'could' but it would require header parsing at run-time which we are trying to avoid).
need to understand if we use
isArray()
anywhere in a way that would need to differentiate betweenstd::array<T ,N>
andT[N]
.
Fortunately all calls to TypeWithDict::isArray()
seem to be within TypeWithDict
. If we'd change it to return true
for std::array
, the consequences would be
TypeWithDict::typeInfo()
would throw an exception https://github.com/cms-sw/cmssw/blob/d7f103c459dd195eaa651ce0c30e4bd4bfd0fe43/FWCore/Reflection/src/TypeWithDict.cc#L371-L375
On the other hand it should throw an exception already now because of *ti_ == typeid(dummyType)
TypeWithDict::getClass()
would return nullptr
instead of TClass *
corresponding the std::array
TypeWithDict::isClass()
would return false
instead of true
TypeWithDict::name()
would return T[N]
instead of std::array<T, N>
TypeWithDict::arrayDimension()
would return a value instead of failing an assertionTypeWithDict::maximumIndex()
would return a value instead of failing an assertionTypeWithDict::toType()
would return a TypeWithDict
corresponding T
instead of std:array<T, N>
(i.e. a copy)The consequences of TypeWithDict::isClass()
returning false would be
TypeWithDict::isTemplateInstance()
would return false
instead of true
TypeWithDict::dataMemberSize()
would return 0 instead of 1TypeWithDict::functionMemberSize()
would return 0
instead of 33
TypeWithDict::dataMemberByName()
would return default-constructed MemberWithDict()
TypeWithDict::functionMemberByName()
would return default-constructed FunctionWithDict()
TypeWithDict::templateArgumentAt()
would return default-constructed TypeWithDict()
I didn't continue further, but based on this I'm not sure if treating std::array<T, N>
exactly like T[N]
would be sufficient.
DictionaryTools.h
has two overloads for the checkDictionary()
(the definition of hasDictionary()
can be seen in https://github.com/cms-sw/cmssw/issues/30493#issuecomment-653178717
There are also two overloads for checkDictionaryOfWrappedType()
https://github.com/cms-sw/cmssw/blob/b06207136c07abe4658ec38988715e9c4b52ddf5/FWCore/Reflection/interface/DictionaryTools.h#L25-L27
that both call the overload 2 of checkDictionary()
.
The checkDictionary()
is called from
global
, limited
, and one
https://github.com/cms-sw/cmssw/blob/b06207136c07abe4658ec38988715e9c4b52ddf5/FWCore/Framework/src/one/OutputModuleBase.cc#L139-L141
The two overloads of checkDictionary()
were added in https://github.com/cms-sw/cmssw/pull/15198. Before, there were
I'm currently trying to understand why the overload 2 of checkDictionary()
is needed.
Overload 1 does (explicitly but somewhat obscurely):
while Overload 2 assumes that the dictionary library library is already loaded (and that the answer is already compiled in TypeWithDict object).
I tried to just call the checkDictionary()
overload 1 from overload 2, but didn't work because the OutputModule takes the TypeWithDict
from the BranchDescription
, which in turn initializes the TypeWithDict
from the type name here
https://github.com/cms-sw/cmssw/blob/b05f18a5d7d6c83b04c4101b9e7ff83e47a59594/DataFormats/Provenance/src/BranchDescription.cc#L164
I'm not really sure why the (unwrapped) TypeWithDict
is initialized from type name also when a TypeWithDict
is given as an argument to the constructor of BranchDescription
https://github.com/cms-sw/cmssw/blob/b05f18a5d7d6c83b04c4101b9e7ff83e47a59594/DataFormats/Provenance/src/BranchDescription.cc#L78
and that TypeWithDict
is initialized from std::type_info
here
https://github.com/cms-sw/cmssw/blob/b05f18a5d7d6c83b04c4101b9e7ff83e47a59594/FWCore/Framework/src/ProductRegistryHelper.cc#L77-L78
(still need to investigate where that std::type_info
originates from)
Another attempt I did was to replace the checkDictoinary()
overload 2 along what I did with overload 1 in https://github.com/cms-sw/cmssw/issues/30493#issuecomment-653178717 with
bool checkDictionary(std::vector<std::string>& missingDictionaries,
std::string const& name,
TypeWithDict const& typeWithDict) {
if ((!bool(typeWithDict) || typeWithDict.invalidTypeInfo()) and not hasDictionary(typeWithDict.name())) {
missingDictionaries.emplace_back(name);
return false;
}
return true;
bool hasDictionary(std::string const& typeName) {
return hasDictionary(typeName.c_str());
}
bool hasDictionary(char const* typeName) {
return TDictionary::GetDictionary(typeName) != nullptr;
}
This allowed me to pass the checkDictionary()
checks, but then I encountered a call to BranchDescription.unwrappedTypeID()
in
https://github.com/cms-sw/cmssw/blob/b05f18a5d7d6c83b04c4101b9e7ff83e47a59594/FWCore/Framework/src/one/OutputModuleBase.cc#L159-L160
which leads to a call to TypeWithDict::typeInfo()
, which then throws an exception because the held type_info
is dummy. That raises an additional question if we really need a valid std::type_info
there (based on quick look I'd say "yes").
On the other hand the consumes()
call OutputModuleBase
does work for basic types (like int
). If I guess correctly, the TypeWithDict
have a valid std::type_info
for basic types because of
https://github.com/cms-sw/cmssw/blob/b05f18a5d7d6c83b04c4101b9e7ff83e47a59594/FWCore/Reflection/src/TypeWithDict.cc#L173-L212
I tried to just call the checkDictionary() overload 1 from overload 2,
That seems wrong. Overload 2 knows it does not need overload 1. What is the goal trying this?
I tried to just call the checkDictionary() overload 1 from overload 2,
That seems wrong. Overload 2 knows it does not need overload 1. What is the goal trying this?
I wanted to see what happens if I skip the "answer already compiled in TypeWithDict
" given that for std::array<T, N>
the information in TypeWithDict
is wrong in some way (and the exact way it is wrong is what I'm trying to understand).
I'm not really sure why the (unwrapped)
TypeWithDict
is initialized from type name also when aTypeWithDict
is given as an argument to the constructor ofBranchDescription
On the other hand it would certainly be inconsistent if TypeWithDict::byName()
and TypeWithDict::byTypeInfo()
to give different answers regarding whether the TypeWithDict
has a valid std::type_info
or not.
Matti asked me to look into this. Below are some comments about what I found (I know this repeats some parts of the discussion above). Matti asked me to answer two questions, plus there is an extra comment at the end.
First Question - How hard is it to make CMSSW work with std::array as a product type?
This is the major problem. TClass::GetClass("std::array<int,4>") will return a non-null pointer to a TClass. If you use the pointer to call GetTypeInfo(), then you get a nullptr. We use the type_info. The lookup tables that we use to find data items depend on the type_info (through the TypeID, but that is mostly a wrapper around a type_info*). The ProductResolvers are ordered based on the TypeID. The whole getByToken design is based on this.
The type_info is used by the output module when it declares the products it consumes. That is why there is a dictionary check for the products an output module keeps.
Independent of the type_info, we use the dictionary in the code that supports View. If we want to support View access into std::array or things like std::vector of std::array, we would need the dictionary for std::array.
There could be other issues. Things like cutParser and expressionParser use ROOT dictionaries. There are things that go directly to TClass to do various things. I would not be surprised to find other problems that would result from writing std::array's to output files.
My first thought is that the best thing is simply to say we do not support std::array as a product type, because ROOT does not support them in the same way it supports other types. One can use a std::vector as a substitute for a std::array in a data product. Also, I did a test. One can at least read and write data products that contain a std::array as a data member. A simple class that wraps a std::array should work already, at least as far as the Framework is concerned (other things might have problems with that, I don't know).
I tried to think of some alternatives:
For produced products, we can get the type_info from the "produces" declaration in the EDProducer. It would take some development effort to make this work, but those type_info's are available. This does not work for products read from the input. For those products, we have a string with the class name and that is all that is saved in the file in the persistent part of the BranchDescription. We use TClass to convert the class-name string into a type_info. I do not see an easy alternative for this case. For transient products this is not a problem, but for non-transient products this is a major problem.
I do not see any easy alternate way to get the type_info. Here are some ideas, but I don't like any of them:
Convince ROOT to support std::array in a way that allows getting the type_info.
We could manually code a function like this with cases for each full specialization of std::array:
type_info const& getTypeInfo(std::string const& className) {
if (className == "std::array<int, 4>") {
return typeid(std::array<int, 4>);
} else if (... all the other cases we use manually implemented ...
...
}
Use some reflection system other than ROOT dictionaries.
We could redesign our getByToken infrastructure to work on strings with the class name. This wouldn't be easy and would probably run slower. We could also redesign EDConsumerBase so that it doesn't depend on the type_info.
If we wanted to support Views into std::array, we would also need to do work there. This would go beyond being able to get the type_info.
Second question - What do we use dictionaries for in the Framework? Matti asked this question while thinking about whether we can allow transient products without dictionaries.
We use the dictionaries to translate a class name to a type_info. We use this type_info in our product lookup system (getByToken) and also in EDConsumerBase. Note that with some significant work, we could always get the type_info from the "produces" declaration in the EDProducer. Transient products are always "produced", never from the input file.
We use dictionaries in the code that implements Views. (Currently we need dictionaries for the top level class, the Wrapper, the contained class and the base classes of the contained classes. We already take advantage of the fact that we do not need dictionaries for all constituents of transient classes)
We use the dictionary of the wrapped type to control whether a class is transient or not (also can set the basketSize and splitLevel in classes_def.xml, these are accessible only from the dictionary). The critical thing is you currently need the dictionary of the wrapped type to determine whether or not a type is transient.
We use dictionaries to identify which Run and Lumi products are mergeable. This only applies to Run and Lumi products.
One unrelated other thing that I noticed while doing this. There is a BranchDescription constructor that has as an argument a TypeWithDict. That argument is used to initialize something, but before that value is ever used it gets overwritten. We could delete that argument and all the code related to it without affecting anything.
(still digesting but a quick note):
things like std::vector of std::array
nesting of std::array is not yet supported by the I/O
For 1 (and somewhat 2) (TClass returns a typeinfo or getTypeInfo special case std::array), the major question/trade-off is that we need to either: a. enumerate all std::array instances used (in selection.xml or getTypeInfo) [ Error prone / missing instances ] b. use the interpreter to produce the type info [ slightly slower, increase memory use, potentially significantly ] c. auto-generate the dictionary for the std::array (similar to STL collection) [ doesn't support top level std::array, only those nested in a class or struct ]
@wddgit Thanks for the detailed information. I am reviewing if there is a good practical solution.
https://github.com/cms-sw/cmssw/pull/43076 provides a workaround for storing std::array directly
Current
std::array<T, N>
products can not be put into the event because framework complains from missing dictionary.std::array
is one of those types that ROOT handles internally without an explicit dictionary. Our current check (or one of them) https://github.com/cms-sw/cmssw/blob/b0ddd4e29c3f1b1e04ce77cdb7d49d15098f46b1/FWCore/Reflection/src/TypeWithDict.cc#L808-L815 handles built-in types as a special case, and then asks fromTClassTable
if the type has a dictionary. Forstd::array
theTClassTable::GetDict()
returns anullptr
, and therefore the framework code thinks thestd::array
to not have a dictionary.We should look for a better way to ask from ROOT if its parser system can interact with a type
T
.