Open faustus123 opened 6 months ago
One question: How do you build the
TMemFile
s on the writing side? (Or plan to build)
You can see a pair of programs that do the serialization/deserialization and zmq messaging here. These are independent of the PODIO library. They are specific to the ROOT file format of the ePIC simulated data files in that it assumes 4 TTrees with the event data being kept in "events".
https://github.com/JeffersonLab/SRO-RTDP/tree/davidl_podio2tcp/src/utilities/cpp/podio2tcp
How this is implemented on the ePIC side is via a JANA JEventSource
plugin. A rough prototype of that can be seen here. It mostly copies the one already in EICrecon that uses ROOTReader::openFiles()
. This will need a lot of cleanup, but the core functionality is there.
My main concern for this approach is that this ties the streaming of Frames rather tightly to ROOTs TTree backend. I would have tried to serialize on the FrameData level, which should be possible without any dependency on a specific backend. However, since you have a proof-of-principle already for the full chain, your approach is potentially a bit easier to implement (at least on the podio side), since a lot of the heavy lifting is done else where. You probably have a better feeling for how involved the overall approach is.
I agree 100% that a better implementation will be to do this at the Frame level. This approach was quicker for me to implement and didn't require my digging too deep into the PODIO source. It will serve my short term goals until someone more familiar with PODIO can do it the right way. I also think the openTDirectory()
still adds some flexibility that may be useful in some other cases.
Another somewhat smaller concern is performance. When I first implemented Frame I/O I didn't pay too much attention to the things that happen "only once", e.g. when opening a file, or when first setting up a category for reading the first entry. Hence, I crammed quite a bit of (potentially) unnecessary work into that since the overhead will be distributed in any case. I am not entirely sure how things look if this "overhead" happens repeatedly for just a few events. In case all of the events that you stream have the same metadata, and the same categories there might be a way to cache a bit more of this information between calls to
openTDirectory
in case performance becomes an issue. Until that is the case, I would prefer the current version, since has fewer assumptions and is more readable.
Yes, performance is not great, but it is currently obscured by the horribly slow ePIC reconstruction code. When reading from a file, it takes 2-5 seconds to reconstruct an event. Preliminary testing with pre-formed send buffers yields sending takes only about 0.02 seconds/event.
When it comes to reading in the meta-data, one knob we have to turn is how many events are sent at once. I can just as easily send 100 events or more at a time which does reduce that overhead. I'll tune that as needed when I start scaling up for more performance testing.
Thanks for the links. After a quick look and IIUC the main use case here is to read from stream in Jana2, but the source of the stream would then be outside of Jana2? I am mainly asking, because I was wondering whether you also planned to implement (or actually need) the creation of the TMemFile
somewhere in memory. Currenty, it seems that everything starts from an existing podio file on disk rather.
Thanks for the links. After a quick look and IIUC the main use case here is to read from stream in Jana2, but the source of the stream would then be outside of Jana2? I am mainly asking, because I was wondering whether you also planned to implement (or actually need) the creation of the
TMemFile
somewhere in memory. Currenty, it seems that everything starts from an existing podio file on disk rather.
Correct, there is no need for a JANA2 process on the sending side.
I do not anticipate using TMemFile
in any production streaming system. It is just the (relatively) easy way to serialize ROOT objects which is the form the simulated ePIC data happens to be in. My expectation is that the DAQ WG will decide on some custom format for the stream data driven by the customized nature of the electronics. We may deal with that either in a specific JANA2 JEventSource
or by some future transformer code that converts it into a PODIO stream (whatever that design turns out to be).
It is true that at the moment and for the next couple of years we will likely only have simulated data and it will be in the form of podio files on disk. Hence the need to generate stream(s) from that so the SRO infrastructure can start development.
I haven't heard anything on this for 2 months. Are there any plans to merge it or any hold-ups preventing it?
@faustus123 - apologies for having it dormant for so long. We've been doing some refactoring which we just managed to finish. If you can update your PR I can deal with it this week.
Uggh. I just tried to rebase to bring my fork up to date and a lot of changes have happened in the ROOTReader files in the meantime. It is going to take a while to try and resolve this. It would probably be more efficient if I do this after you have finished with whatever work you are doing there. Please let me know when you are finished with the current round of work and I will take a look at it then.
The changes that we want to go in first are the ones in #625. Looking at the changes in this PR, I am wondering whether you cherry-picking might be easier than rebasing?
BEGINRELEASENOTES Add ROOTReader::openTDirectory() to allow external management of TDirectory (e.g. TMemFile) to support streaming. ENDRELEASENOTES
This is motivated by issue #565 . It makes it possible to write a stream source that sends data packets in the form of serialized
TMemFile
objects using ROOT's built-in serialization/deserialization functionality. EachTMemFile
may contain a small number of events in the "events" tree and the complete metadata trees allowing the other standardROOTReader
methods to work. I have a working test case that reads events from an existing PODIO ROOT file and sends them this way via zmq messages. This couples with the ePIC recon code and works so there is a proof-of-principle that this works as needed.A couple of notes on what was done:
A
TTree *m_metaTree
member was added to take the place of theTChain
in most of the calls for which aTChain
is not actually needed. When aTChain
is actually used, this just points to theTChain
.The
TChain
was changed from being kept in aunique_ptr
to being a direct member of the class. I suspect theunique_ptr
trick was used becauseTChain
has its copy and assignment constructors unavailable as private members of theTChain
class and that causes some issues with storing these instd::map
. I ran into other complications with trying to implementm_metaTree
to support both cases (an externally suppliedTTree
or an internalTChain
) I resolved it by explicitly deleting the copy and assignment constructors forROOTReader
which avoids the problem of ownership of theTChain
. Removing theunique_ptr
mechanism was then possible which simplified things in the class. There may have been other reasons for usingunique_ptr
which I don't know about.A new test was added
read_frame_root_tdirectory
which is almost a complete copy ofread_frame_root_multiple
. I can't claim to fully understand the testing configuration, but hopefully this is in line with what you would like.