cmsdaq / DAQAggregator

Aggregate monitoring data from the CMS DAQ system
0 stars 3 forks source link

When determining an FRLPc's crashed flag, only take into account jobs that are running on the correct port #116

Open Phil2812 opened 6 years ago

Phil2812 commented 6 years ago

When determining the crashed state of an FRLPc, the aggregator looks at all jobs in the jobTable of the FRLPc host's context [1]. However, it should only take into account the jobs with a jid that includes the FRLPc's hostname and port, not all the jobs running in the host's context.

Also, if multiple jobs with the same jid (same host, same port) exist, for example because the job crashed and remains in the table in Z-state [2], only the more recent job (based on startTime) should be looked at to determine the status.

(consider doing the same for FMMApplication, RU, BU)

[1] https://github.com/cmsdaq/DAQAggregator/blob/06d2fc63631db0d0bf3783527cbd1b95479ab3c2/src/main/java/rcms/utilities/daqaggregator/data/FRLPc.java#L81

[2] example_duplicate_jid_2018-09-05 18-23-26