art-daq / artdaq_daqinterface

Other
0 stars 1 forks source link

artdaq daq interface run sort based on timestamp can lead to bad behavior if folder timestamps are inadvertently changed #11

Open eflumerf opened 2 years ago

eflumerf commented 2 years ago

This issue has been migrated from https://cdcvs.fnal.gov/redmine/issues/25185 (FNAL account required) Originally created by @wesketchum on 2020-11-10 22:58:25


Just ran into this on ICARUS:

For some unknown reason the timestamp on one of the folders for an old run was changed, such that doing ls -lrth gave something like:

dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 13:47 3298
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 14:17 3299
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 14:47 3300
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 15:15 3301
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 15:30 3302
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 15:34 3303
dr-xr-xr-x 2 icarus   E-1052 4.0K Nov 10 15:49 3304
dr-xrwxr-x 2 icarus   E-1052  225 Nov 10 16:35 1

When we try to start a new run then, DAQ interfaces wants it to be run number 2, which already exists, ergo ... no new run.

I don't know why that happened to the folder for run number 1 (clearly a separate issue), and I'm just speculating based on the observed problem that DAQInterface sorts the run records by time to determine the most recent one ... but if it does, perhaps it should do a true numerical sort instead? little but more coding, but perhaps worth it?

jcfreeman2 commented 2 years ago

It of course would be trivial to perform a true numerical sort of the run record folders. There are two reasons this hasn't been done to date; please let me know if you think these reasons aren't strong enough to not make the change:

1) By performing a time sort rather than a numerical sort, you can effectively support two parallel sets of runs. E.g., on ProtoDUNE we'd count runs whose processes were controlled by JCOP up from 1, and runs whose processes were controlled directly by DAQInterface up from 1000001. So, e.g., the folder /nfs/sw/artdaq/run_records/1001751 on np04 refers to the 1751st run which used DAQInterface for direct process control. If a numerical sort were performed then you couldn't truly see what the last ten runs were irrespective of JCOP vs. DAQInterface process control.

2) To the extent that it's pathological that the timestamps of your run records would get altered, it could be considered desirable that the time-based sort of the run records will reveal the problem so it can be avoided in the future.

eflumerf commented 2 years ago

I'll also note that in the listing provided, somehow Run 1 has its group write flag set; normally timestamp changes would not be possible since the run records are not writeable.