archiver-appliance / epicsarchiverap

This is an implementation of an archiver for EPICS control systems that aims to archive millions of PVs.
Other
38 stars 37 forks source link

PV connected but not monitored after 1st sample #135

Closed spialla closed 2 years ago

spialla commented 2 years ago

Some PVs result connected and monitored but the last event received is the first sample from initial connection and then nothing is refreshed. If I pause/resume the variable a new value is sampled and again no monitoring.

No roport are produced for such PVs.

With caget/camonitor I can access the variable without issues.

Suggestion?

djlauk commented 2 years ago

We had the same behaviour. I was not able to find any pattern of why this happens, and after one system service day when we rebooted all the machines, the faulty behaviour was gone and never came back.

slacmshankar commented 2 years ago

In the PVDetails ( in recent releases ), there should be a "Last monitor received at" entry that should tell us if we actually received a monitor event. Does this look correct?

spialla commented 2 years ago

No, "Last monitor received at" is updated once according to stored values and this happens only in two cases: appliance assigned and after a pause/resume.

slacmshankar commented 2 years ago

Could we make a claim that we only receive one monitor event from the underlying layers? That is, if we had received more monitor events, this timestamp would have reflected the latest ca_monitor event. So any issue is upstream of https://github.com/slacmshankar/epicsarchiverap/blob/master/src/main/org/epics/archiverappliance/engine/pv/EPICS_V3_PV.java#L622

Can you attach the PVDetails for this PV? This is /getPVDetails from https://slacmshankar.github.io/epicsarchiver_docs/api/mgmt_scriptables.html

slacmshankar commented 2 years ago

Also making sure when you check with camonitor you are using the proper mask Some like, camonitor -m al

spialla commented 2 years ago

Could we make a claim that we only receive one monitor event from the underlying layers? That is, if we had received more monitor events, this timestamp would have reflected the latest ca_monitor event. So any issue is upstream of https://github.com/slacmshankar/epicsarchiverap/blob/master/src/main/org/epics/archiverappliance/engine/pv/EPICS_V3_PV.java#L622

Can you attach the PVDetails for this PV? This is /getPVDetails from https://slacmshankar.github.io/epicsarchiver_docs/api/mgmt_scriptables.html

[{"name":"PV Name","value":"RR-R-LLRF-DIGITAL-001:ad1:ch3:power_remote_s","source":"mgmt"},{"name":"Instance archiving PV","value":"appliance0","source":"mgmt"},{"name":"Archival params creation time:","value":"May\/05\/2022 09:28:52 +02:00","source":"mgmt"},{"name":"Archival params modification time:","value":"May\/05\/2022 09:28:52 +02:00","source":"mgmt"},{"name":"Archiver DBR type (from typeinfo):","value":"DBR_SCALAR_DOUBLE","source":"mgmt"},{"name":"Is this a scalar:","value":"Yes","source":"mgmt"},{"name":"Number of elements:","value":"1","source":"mgmt"},{"name":"Precision:","value":"0.0","source":"mgmt"},{"name":"Units:","value":"","source":"mgmt"},{"name":"Is this PV paused:","value":"No","source":"mgmt"},{"name":"Sampling method:","value":"SCAN","source":"mgmt"},{"name":"Sampling period:","value":"1.0","source":"mgmt"},{"name":"Are we using PVAccess?","value":"No","source":"mgmt"},{"name":"Extra info - SCAN:","value":"1 second","source":"mgmt"},{"name":"Extra info - RTYP:","value":"liberaSignal","source":"mgmt"},{"name":"Channel Name","source":"pv","value":"RR-R-LLRF-DIGITAL-001:ad1:ch3:power_remote_s"},{"name":"Host name","source":"pv","value":"192.168.117.43"},{"name":"Controlling PV","source":"pv","value":""},{"name":"Is engine currently archiving this?","source":"pv","value":"yes"},{"name":"Archiver DBR type (initial)","source":"pv","value":"DBR_SCALAR_DOUBLE"},{"name":"Archiver DBR type (from CA)","source":"pv","value":"DBR_SCALAR_DOUBLE"},{"name":"Number of elements per event (from CA)","source":"pv","value":"1"},{"name":"Is engine using monitors?","source":"pv","value":"no"},{"name":"What's the engine's sampling period?","source":"pv","value":"1.0"},{"name":"The SCAN period (ms) after applying the jitter factor","source":"pv","value":"950"},{"name":"Is this PV currently connected?","source":"pv","value":"yes"},{"name":"Connection state at last connection changed event","source":"pv","value":"Not connected"},{"name":"When did we receive the last event?","source":"pv","value":"May\/05\/2022 09:29:12 +02:00"},{"name":"What did we last push the data to the short term store?","source":"pv","value":"May\/05\/2022 09:29:22 +02:00"},{"name":"When did we request CA to make a connection to this PV?","source":"pv","value":"May\/05\/2022 09:29:12 +02:00"},{"name":"When did we first establish a connection to this PV?","source":"pv","value":"May\/05\/2022 09:29:12 +02:00"},{"name":"When did we last lose and reestablish a connection to this PV?","source":"pv","value":"Never"},{"name":"When did we last lose a connection to this PV?","source":"pv","value":"Never"},{"name":"How many times have we lost and regained the connection to this PV?","source":"pv","value":"0"},{"name":"How many events so far?","source":"pv","value":"1"},{"name":"How many raw scan events so far?","source":"pv","value":"1"},{"name":"How many events lost because the timestamp is in the far future or past so far?","source":"pv","value":"0"},{"name":"Timestamp of last event from the IOC - correct or not.","source":"pv","value":"May\/05\/2022 09:30:15 +02:00"},{"name":"How many events lost because the sample buffer is full so far?","source":"pv","value":"0"},{"name":"How many events lost because the DBR_Type of the PV has changed from what it used to be?","source":"pv","value":"0"},{"name":"How many events lost totally so far?","source":"pv","value":"0"},{"name":"Average bytes per event","source":"pv","value":"95"},{"name":"Estimated event rate (events\/sec)","source":"pv","value":"0.04"},{"name":"Estimated storage rate (KB\/hour)","source":"pv","value":"14.52"},{"name":"Estimated storage rate (MB\/day)","source":"pv","value":"0.34"},{"name":"Estimated storage rate (GB\/year)","source":"pv","value":"0.12"},{"name":"PV connection state machine state","source":"CA","value":"GotMonitor"},{"name":"Last monitor received at","source":"CA","value":"May\/05\/2022 09:29:12 +02:00"},{"name":"Last monitor had a valid DBR?","source":"CA","value":"true"},{"name":"Last monitor event timestamp","source":"CA","value":"May\/05\/2022 09:30:15 +02:00"},{"name":"Various transient errors","source":"CA","value":"0"},{"name":"Do we have a CA channel?","source":"CA","value":"true"},{"name":"Do we have a subscription?","source":"CA","value":"true"},{"name":"CAJ Searches","source":"CA","value":"1"},{"name":"CAJ channel ID","source":"CA","value":"2942"},{"name":"CAJ server channel ID","source":"CA","value":"2929"},{"name":"CAJ connection state","source":"CA","value":"CONNECTED"},{"name":"Daily metadata last saved at","source":"CA","value":"Never"},{"name":"Do we use DBE_Properties?","source":"CA","value":"false"},{"name":"Do we have a DBE Properties subscription?","source":"CA","value":"false"},{"name":"The internal connected bool","source":"CA","value":"true"},{"name":"The internal running bool","source":"CA","value":"true"},{"name":"Do we have a valid DBR Type constructor","source":"CA","value":"true"},{"name":"The CAJ command thread id","source":"CA","value":"3"},{"name":"Any other PV's being controlled?","source":"CA","value":"false"},{"name":"Has metafields?","source":"CA","value":"true"},{"name":"Hostname of PV from CA","source":"CA","value":"192.168.117.43"},{"name":"Channels for the extra fields","source":"pv","value":"1"},{"name":"Connected channels for the extra fields","source":"pv","value":"0"},{"name":"Sample buffer capacity","source":"pv","value":"11"},{"name":"Time elapsed since search request (s)","source":"pv","value":"-1"},{"name":"Name (from ETL)","source":"etl","value":"RR-R-LLRF-DIGITAL-001:ad1:ch3:power_remote_s"}]

spialla commented 2 years ago

Also making sure when you check with camonitor you are using the proper mask Some like, camonitor -m al

Yes, "camonitor " show the value changing while "camonitor -m al " has the same behaviour of the archiver

slacmshankar commented 2 years ago

Thanks for the PVDetails. I'll stare at it a bit more.

If "camonitor -m al " has the same behaviour of the archiver; then the issue is with your PV I think. This could be a difference in your ADEL and MDEL; the archiver uses "camonitor -m al " when establishing monitors with the PV.

spialla commented 2 years ago

Unfortunately the IOC is embedded with the device provided by the manufacturer with custom data type "liberaSignal". I tried to include in such IOC some CA-link to map "liberaSignal" on standard "ai" record and archiver works poperly with these wrapped PVs.

slacmshankar commented 2 years ago

So, can I close this issue then?

spialla commented 2 years ago

If CA-link is the only possible workaround, you can close the issue. Thanks for the support.

nikitakuklev commented 2 years ago

We have a similar issue - some fields on very old IOCs are hardcoded to only have value monitor mask, and are thus not archived. It is difficult to change IOC configs, so a preferred solution would be to have a global or per-PV flag to use value mask.

As a temporary global fix, am I correct in that only one line needs to be changed:https://github.com/slacmshankar/epicsarchiverap/blob/master/src/main/org/epics/archiverappliance/engine/pv/EPICS_V3_PV.java#L457?

nikitakuklev commented 2 years ago

Update - this change worked. @spialla - I can send the modified files if you want to try with your setup.

Detailed steps (Win10 x64):