Open ajedele opened 1 year ago
While the unpackers are experiment specific, I do not think that the resulting files describing the mapped data are necessarily so.
Two typical cases of changes might appear:
It would be great if every ext h101 file committed to r3broot would include the unpacker (name, git hash), date, path and exact invocation used to create that file.
If we were to auto-generate the h101 headers during compilation, this would imply that each compile can only work with a single experiment. I would favor the current approach where one can analyze different experiments with the same compile. (I would however favor CI tests with experiment unpackers so we can ensure that nobody breaks the compatibility of the h101 files in git with any unpacker.)
On a side note, I think the construction of the overall h101 for the ucesb reader in the C macros, complete with offsetof's and everything is something which should go. The overall struct consisting of N detector specific structs does not seem to be required anywhere. Instead, each reader class could tell the R3BUcesbReader (or source or whatever) the size of its h101 struct at runtime, and the R3BUcesbSomething could use the arithmetic technique of addition to figure out how large the buffer should be in total, and provide each reader with a pointer to their region in the buffer. For bonus points, this should be accomplished by a variadic method instead of passing void pointers.
Of course, it could also be argued that there is no reason why the readers one wants to instantiate should be specified in the macro at all, as they are implied by the mapping. If the unpacker of an experiment provides data of some detector Foo, just instantiate R3BFooReader. (In that case, there should be some way to override this. When analyzing calibration runs of Califa with the s522 unpacker, one probably would not want to instantiate the readers for all of the other detectors. Of course, one fix would be to use the califa_only unpacker for that (if it were not for the misguided decision to encode the presumed gain settings of the califa preamplifiers in the channel mapping).)
Another improvement would be to allow multiple instances of one reader which read data of the same type, but with different name prefixes.
For example, if both the PSPs and CALIFA use FEBEX with the TUM firmware, we can already have have the same code handling the unpacking. If we break the 1:1 mapping between ext h101 structs (data types) and the name string prefix (instances), we could also use two instnces of the same R3BTumFebexReader(std::string srcPrefix) for reading both of into two TClonesArrays of identical types but different names.
Perhaps @inkdot7 or @bl0x have some comments.
This seems related to the cleanup that is needed to address the issues of incomplete data mappings of #712 . (I think my comment dated Dec 15 could be relevant.)
Keeping generated files away from version control systems is a good idea. (Edit: to avoid misunderstandings: This means to either generate them on-the-fly, or avoid them at all. Keeping files outside version control is not an improvement.)
One thing that should for sure be changed is to stop using the ..._LAYOUT_INIT
- they are not nice.
git grep LAYOUT_INIT
should then show nothing. That would be a step on the way.
Hmm, perhaps they are not used, just copy-pasted? Then much easier to get rid of them. Then the comment of Dec 15 could be re-read.
I should perhaps clarify a bit:
The structures declared/used in r3broot I imagine to be hand-crafted/cleaned versions of what may possibly have come from ucesb as a suggestion for what structure members are available for a detector (i.e. a generated header). Likewise, the EXT_STR_xxx_ITEMS_INFO
structure for that detector (or even part of detector) is a (possibly sanitized) version, and then gradually modified if new members are added. That information is given to the ext_data interface on the reading end (r3broot), by the ext_data_structure_info
means in the call to ext_data_setup()
.
From the writing end, the ext_data interface gets a list of everything which is available from a certain unpacker. It does not need any generated header file for that. It always comes with the data!
There is thus no need to generate any headers again and again, e.g. during compilation. There is also no need to dump them for full experiments, since the bits and pieces are known by the substructures in r3broot.
UCESB has been fixed to not include the deprecated LAYOUT_INIT
items in generated headers (unless particularly requested with LAYOUT
next to STRUCT_HH
). Also found+fixed a small bug when hunting around, so thanks for bringing this up.
The LAYOUT_INIT
stuff is gone! #859
Hi all,
As Haakan mentioned, 'Keeping generated files away from version control systems is a good idea.' That is my suggestion.
Most of our Readers are written to be versatile in regards to detector and channel numbers. They extract this information from the mapping information in the exth101{det name}.h file. The issue is that for 'frozen' detectors, the mapping in the unpacker is consistent. You can copy-paste the ext files from a previous experiment. However, the number of truly 'frozen' detectors is very small. In addition, the mapping naming convention and channel mapping is commonly not 1-to-1 from experiment to experiment. For example, the mapping and naming conventions of the tofd has slightly deviated for ever experiment.
Most students who do the calibrations do not know the ext file are generated files. They just use the tofd_h101_tofd.hh files in the Git repository. This is also not clearly specified anywhere in the code base - not basic-user friendly. If (for example) you use the ext_h101_tofd.hh file from the previous experiment, the mapping and naming convention is not the same and the code should fail. This does not always happen (especially if it's a mapping problem).
I do NOT want the ext files to be generated during the compile. That's overkill. Rather, the ext files should just be removed from version control and copied over for each experiment. Alternatively, as Philipp mentioned, we could just add the experiment to the end of the ext file name. I think that might be the easiest - especially since the current naming convention is all over the place and the ext file repository needs cleaning up.
From my understanding, the typical ext_h101 file defines a struct with some member variables and a macro to map these to the corresponding data fields in the stream by name. (Onion data structures are just byte-compatible to the flat structures and are thus initialized with the same macro, it seems.)
As the mapping between struct member names and strings in the macro is 1:1, I would assume that a mapping file which has different strings for mapping fields from the stream will also have different names in the struct (i.e. will not compile with the reader code). I had a look at the ext_h101_tofd.h for S522 and S454 and the onion struct does not look very different (apart from array sizes differing, which seems fine (if you have the largest size in git)).
The actual mapping being different is a separate problem. I do not have any good solution for mapping, generally. Doing the mapping in R3BRoot will involve tons of illegible parameter containers. Mapping in the unpacker (which is what we do) avoids this, but it turns out that there are mapping adjacent problems (like "what is the febex path of that califa channel?" or "what is the trigger timing I have to use with that TofD channel?" which do not lend themselves to be encoded with a permutation. \footnote{Not impossible, though: califa encodes the gain setting as part of a channel id. Still, I very much disrecommend using the most significant byte of the TofD channel index to store the index of the trigger timing channel to use with that.} In practice, this means that the mapping is all over the place: the main mapping happens in the unpacker, and either the complete mapping or ancillary parts of it are also accessed from R3BRoot.
ext_data_fetch_event()
does remapping of item locations from the source to destination structure based on names, is given by *struct_info
in the call to ext_data_setup()
.
EXT_STR_ITEM_INFO_LIM(ok,si,offset,struct_t,printerr,\
CALIFA_ENE, UINT32,\
"CALIFA_ENE",4864); \
from my understanding, this maps the h101 field identified by the string "CALIFA_ENE" to the part of the struct member called CALIFA_ENE, correct?
In principle, one could also smuggle in a variable prefix in the string so that
struct h101_califa_munich
{
uint32_t ENE;
...
};
can get mapped to "CALIFA_ENE" or "PSP_ENE" at runtime. This would allow using a generic R3BMunichFebexReader to handle both cases.
(trying to post this by phone per mail, apologies if it goes horribly wrong).
@klenze yes. And the name 'smuggling' too! I.e. the string name is what is matched, the 'bare' name is what is used to find the actual offset in the local (=destination) structure.
Could it be a better solution that we have a separate git repository for all ext header files and include this separate repository as a git submodule under this repository?
We could tag the version of the ext header repository according to experiment name before each beam time. If someone needs to analysis old data, it's also very flexible to change the version of the ext header repository back to a specific experiment version while keeping using the latest version of R3BRoot.
A candiate place to store them would be in the experimental upexps dir for each experiement as they are generated from here e.g -> /u/land/fake_cvmfs/10/cvmfs_fairroot_v18.8.0_fs_nov22p1_extra/upexps/202402_s091s118/ext/ext...
Could it be a better solution that we have a separate git repository for all ext header files and include this separate repository as a git submodule under this repository?
We could tag the version of the ext header repository according to experiment name before each beam time. If someone needs to analysis old data, it's also very flexible to change the version of the ext header repository back to a specific experiment version while keeping using the latest version of R3BRoot.
Dear @YanzhaoW @lr11345, what you are proposing is duplication of code. That is opposite to what I suggested and something which I will not deal with. In fact, the duplications in upexps is something which I for a very long time have pointed out as not being good, and asked to be cleaned up.
Dear @inkdot7
I agree. But I'm not sure it's possible to always keep this backward compatibility for all detectors. For NeuLAND, sure, an ext header file with trigger for 13 dps would also work for 12 dps without trigger (I will test this) iff we never change the var names or var types. But I don't know whether this could also be applied to all other detectors. If yes, we should just use the latest version of ext header for old experiments as well. If no, then maybe a git submodule may be an option.
I agree. But I'm not sure it's possible to always keep this backward compatibility for all detectors. For NeuLAND, sure, an ext header file with trigger for 13 dps would also work for 12 dps without trigger (I will test this) iff we never change the var names or var types.
If there is a need (or simply better names come up) then change all experiment analysis / unpackers to use those names. That will of course be much easier with less copy-paste in upexps.
But I don't know whether this could also be applied to all other detectors.
Most of them at least. Begin there, and hopefully the joy spreads (i.e. amount of code in git shrinks.)
Could it be a better solution that we have a separate git repository for all ext header files and include this separate repository as a git submodule under this repository?
I absolutely would not recommend subjecting our users to git submodules.
@inkdot7: A practical problem @ajedele has encountered is that for some experiments, ROLU ran in a separate TDC (which has its own trigger timing channels which can and must be mapped) and for other experiments, ROLU ran in the same TDC module as LOS, so the trigger signals would have to be mapped to both LOS and ROLU, which I vaguely recall not working?
Also useful would be the ability to declare a signal which gets mapped to nothing (or a zero element signal array) to make a consumer happy. Of course I fully expect you to tell us to actually take ownership of our h101 in the consumer and just use EXT_DATA_ITEM_FLAGS_OPTIONAL. :-)
Looking at the PSP h101 file, It seems to me that the number of preprocessor lines generated scales with the number of detectors.
A decent language would prove a way to generate all these by just feeding the onion version of the psp struct into an appropriate macro, but the metaprogramming facilities of C++ (templates, preprocessor) are not suitable for that.
@inkdot7: A practical problem @ajedele has encountered is that for some experiments, ROLU ran in a separate TDC (which has its own trigger timing channels which can and must be mapped) and for other experiments, ROLU ran in the same TDC module as LOS, so the trigger signals would have to be mapped to both LOS and ROLU, which I vaguely recall not working?
Also useful would be the ability to declare a signal which gets mapped to nothing (or a zero element signal array) to make a consumer happy. Of course I fully expect you to tell us to actually take ownership of our h101 in the consumer and just use EXT_DATA_ITEM_FLAGS_OPTIONAL. :-)
Yes. :-)
To not duplicate code, it would be useful to track which detectors are measured on a global clock, and which are measured 'just' relative to a master start signal.
(Thanks for pointing this out, as it explained what is meant by trigger time channels.)
Looking at the PSP h101 file, It seems to me that the number of preprocessor lines generated scales with the number of detectors.
Sounds very plausible. Is unfortunate.
A decent language would prove a way to generate all these by just feeding the onion version of the psp struct into an appropriate macro, but the metaprogramming facilities of C++ (templates, preprocessor) are not suitable for that.
Yes, the proposal is to do that semi-manually, i.e. to extract the core out of the ext_files and then do some loop around that to set up multiple detectors.
I absolutely would not recommend subjecting our users to git submodules.
Why?
A decent language would prove a way to generate all these by just feeding the onion version of the psp struct into an appropriate macro, but the metaprogramming facilities of C++ (templates, preprocessor) are not suitable for that.
Do you mean compile-time reflection?
Why? Submodules share some properties of the much-detested macros repository, though the details differ. The increased flexibility comes at a price of increased complexity. Anyone doing a PR which changes both an h101 and a reader in a forward and backward incompatible way would have to do two PRs. CI would have to use the h101 commit before it is really merged. Users would have to go through additional steps to get a compile-able checkout.
I think the main use case for git submodules might be instances of Conway's law. If you have a handful of different teams working on with a frequently changing lightweight interface (such as a protocol or file format), then it might make sense to put that interface into a git submodule in all of their projects. (I dunno, I have never designed such a system.)
Maintaining one h101 branch per experiment would not be a good use of git submodules (where a commit in the parent repository always contains the has of a single commit in the child repository).
Additionally, most of the arguments against the original proposal to auto-create the h101 files also apply:
#if CALIFA_h101_has_TOT
in the reader or something. Anecdotally, the CALIFA DAQ originally used submodules within submodules. That was already a hassle even though I had push permissions, and I flattened everything into a single repository. (We have an independent repo for per machine configurations, but that is in a .gitignored subdirectory.)
Do you mean compile-time reflection? Yes
The external files are generated for each experiment in the unpacker. The external files are used by the reader to map the channels correctly.
The external files are experiment specific and should be generated in the experiment-specific unpacker and should be copied over for each experiment. Otherwise, this leads to potential incorrect mapping or channel numbers/ADC ranges/trigger information. Our naming and mapping changes often enough that this will be an issue.
Since this is in the 1st step of the analysis, most new users will be unaware of the potentially incorrect external files.