cmsdaq / DAQExpert

New expert system processing data model produced by DAQAggregator
1 stars 2 forks source link

DQM latency check #21

Open gladky opened 7 years ago

gladky commented 7 years ago

Where to find the DQM latency?

andreh12 commented 7 years ago

@vanbesien is there a simple way how we could programmatically (e.g. http query) get the number of the latest lumi section which has been processed by DQM ?

andreh12 commented 7 years ago

Talking to @vanbesien, he suggested that we should look at the DQM BU disk and look for the file with the highest lumi section number of the ongoing run.

We can compare this to the current lumi section (from TCDSGlobalInfo ?) to get the delay.

To access the DQM BU disk, we either would use ssh or probably mount it (readonly) from the DAQ expert machine.

mommsen commented 7 years ago

The latencies are monitored by F3mon (for all streams and steps.) Thus, I think it would be better to query F3 ES for this kind of information.

smorovic commented 7 years ago

F3mon can help up to macro-merge completion, but that is not the full picture. Transfer script, which copies to DQM BU disk, doesn't (yet) inject any documents into elasticsearch.

vanbesien commented 7 years ago

My two cents: The best way to check what the transfer service is doing is to go look at the actual results on the ramdisk. Since this is where the DAQ chain ends and the DQM chain starts. From an architectural point of view it's probably cleanest to write a very simple low-dependency service that does nothing more than collect files and timestamps and injects this in ES (as @smorovic suggests).

@dmitrijus

dmitrijus commented 7 years ago

I agree with what Broen said. Checking the last lumi delivered on the ramdisk BU and comparing it to the "global" lumi is the way to go.

Additionally, we don't want you to depend on our monitoring. It can change rapidly and the less dependencies, the better.