DUNE-DAQ / minidaqapp

0 stars 1 forks source link

New NetworkManager integration #105

Closed floriangroetschla closed 2 years ago

floriangroetschla commented 2 years ago

The changes can be tested by following the current instructions on how to use the NetworkManager in the minidaqapp wiki and additionally checking out fgrotsch/NetworkManager in readoutlibs and feature/NetworkManager in readoutmodules. Once this is verified to work the wiki can be updated.

bieryAtFnal commented 2 years ago

This is very early feedback, but I need to call into a meeting in a few minutes, and I wanted to let you know about a possible issue as soon as possible. I'm not yet sure how real the problem is. I saw this when trying to use software TPG:

.2021-Nov-18 08:50:36,787 ERROR [void dunedaq::cmdlib::CommandFacility::handle_command(const cmdobj_t&, dunedaq::cmdlib::cmd::CommandReply) at /scratch/dev/sourcecode/cmdlib/src/CommandFacility.cpp:64] Execution of command failed: Caught ers::Issue

    was caused by: 2021-Nov-18 08:50:36,780 ERROR [void dunedaq::readoutlibs::ReadoutModel<ReadoutType, RequestHandlerType, LatencyBufferType, RawDataProcessorType>::init(const json&) [with ReadoutType = dunedaq::fdreadoutlibs::types::SW_WIB_TRIGGERPRIMITIVE_STRUCT; RequestHandlerType = dunedaq::readoutlibs::EmptyFragmentRequestHandlerModel<dunedaq::fdreadoutlibs::types::SW_WIB_TRIGGERPRIMITIVE_STRUCT, dunedaq::readoutlibs::BinarySearchQueueModel<dunedaq::fdreadoutlibs::types::SW_WIB_TRIGGERPRIMITIVE_STRUCT> >; LatencyBufferType = dunedaq::readoutlibs::BinarySearchQueueModel<dunedaq::fdreadoutlibs::types::SW_WIB_TRIGGERPRIMITIVE_STRUCT>; RawDataProcessorType = dunedaq::fdreadoutlibs::SWWIBTriggerPrimitiveProcessor; nlohmann::json = nlohmann::basic_json<>] at /home/biery/dunedaq/18NovNetMgr/sourcecode/readoutlibs/include/readoutlibs/models/ReadoutModel.hpp:91]  The Could not find all necessary queues: raw_input or fragment_queue queue was not successfully created for ReadoutModel

    was caused by: 2021-Nov-18 08:50:36,780 ERROR [dunedaq::appfwk::IndexedQueueInfos_t dunedaq::appfwk::queue_index(const json&, std::vector<std::__cxx11::basic_string<char> >) at /home/biery/dunedaq/18NovNetMgr/sourcecode/appfwk/src/DAQModuleHelper.cpp:27] Schema error: missing queue: fragment_queue
eflumerf commented 2 years ago

I just pushed a commit that at least results in a working system with TPs and DQM enabled, but the TP transmission still seems to be broken; I do not get the expected output in the HDF5.

floriangroetschla commented 2 years ago

It seems like the element id's for the tp datahandlers are broken. This happened when the RU_CONFIG was introduced about a week ago. Before we were passing the total number of data producers to the readout confgen such that for two or more readout apps the link were numbered from 0 to TOTAL_NUMBER_OF_DATA_PRODUCERS-1 for the wib links and TOTAL_NUMBER_OF_DATA_PRODUCERS to TOTAL_NUMBER_OF_DATA_PRODUCERS*2-1 for the tp handlers. When RU_CONFIG was introduced this scheme was not used any more. The reason why we notice this now and not earlier is that I added a check in the datahandler for the geo id that is in the DataRequest (which was added in the NetworkManager branch recently, that's why the check was not there before) is the same as the one the link is configured to. If this is not the case an ers error is sent out and no fragment is returned (as this is a serious misconfiguration). Is there a way for the readout_gen script to know the total number of producers now so that we can set the element id's for the tp handlers correctly?

floriangroetschla commented 2 years ago

I wanted to remove the tp_fragment_sender completely and just use one for everything (dataflow, dqm and tp's). Or is there a reason why we should keep a separate one for the TPs?

bieryAtFnal commented 2 years ago

I don't see how fragments are sent to the DQM process. Is there supposed to be a FragmentSender for that? There is a QueueToNetwork instance for data_fragments_q_dqm, but I don't see that queue being given to the DataLinkHandler anywhere. I see hints that the DQM processes may not be receiving data, but I don't have full confirmation of that yet.

floriangroetschla commented 2 years ago

The data_fragments_q_dqm is indeed not used any more, but DQM should get its fragments through the same FragmentSender as dataflow. The FragmentSender consumes pairs of fragments and connection ids (which are coming from the DataRequest) and sends the fragment to the adequate TRB using the NetworkManager.

bieryAtFnal commented 2 years ago

Ah, nice, FragmentSender handles everything. I've confirmed that DQM is, in fact, getting data.