ChannelFinder / recsync

EPICS Record Synchronizor
Other
14 stars 25 forks source link

Channels "time" never update / polling never finishes #86

Open kingspride opened 4 months ago

kingspride commented 4 months ago

Hello,

I have a question regarding the recceivers' behaviour.

see this excerpt from loglevel debug:

2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Polling begins...
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Find existing channels by IOCID: 10.0.0.15:34980
2024-04-23T01:15:22+0200 [-] DEBUG:urllib3.connectionpool http://localhost:8080 "GET /ChannelFinder/resources/channels?iocid=10.0.0.15%3A34980 HTTP/1.1" 200 None
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Find existing channels by name: FHIFEL:GagePort:Port1_value|FHIFEL:GagePort:Port2_value|FHIFEL:GagePort:Port1_count|FHIFEL:GagePort:Port2_count
2024-04-23T01:15:22+0200 [-] DEBUG:urllib3.connectionpool http://localhost:8080 "GET /ChannelFinder/resources/channels?~name=FHIFEL%3AGagePort%3APort1_value%7CFHIFEL%3AGagePort%3APort2_value%7CFHIFEL%3AGagePort%3APort1_count%7CFHIFEL%3AGagePort%3APort2_count HTTP/1.1" 200 None
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Add existing channel with different IOC: {'name': 'FHIFEL:GagePort:Port1_value', 'owner': 'admin', 'properties': [{'name': 'hostName', 'owner': 'cfstore', 'value': 'fel03'}, {'name': 'iocName', 'owner': 'cfstore', 'value': 'iocGagePort'}, {'name': 'iocid', 'owner': 'cfstore', 'value': '10.0.0.15:34980'}, {'name': 'pvStatus', 'owner': 'cfstore', 'value': 'Active'}, {'name': 'time', 'owner': 'cfstore', 'value': '2024-04-23 01:15:22.941623'}, {'name': 'recceiverID', 'owner': 'cfstore', 'value': 'fel-recceiver'}, {'name': 'recordType', 'owner': 'cfstore', 'value': 'ai'}], 'tags': []}
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Add existing channel with different IOC: {'name': 'FHIFEL:GagePort:Port2_value', 'owner': 'admin', 'properties': [{'name': 'hostName', 'owner': 'cfstore', 'value': 'fel03'}, {'name': 'iocName', 'owner': 'cfstore', 'value': 'iocGagePort'}, {'name': 'iocid', 'owner': 'cfstore', 'value': '10.0.0.15:34980'}, {'name': 'pvStatus', 'owner': 'cfstore', 'value': 'Active'}, {'name': 'time', 'owner': 'cfstore', 'value': '2024-04-23 01:15:22.941623'}, {'name': 'recceiverID', 'owner': 'cfstore', 'value': 'fel-recceiver'}, {'name': 'recordType', 'owner': 'cfstore', 'value': 'ai'}], 'tags': []}
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Add existing channel with different IOC: {'name': 'FHIFEL:GagePort:Port1_count', 'owner': 'admin', 'properties': [{'name': 'hostName', 'owner': 'cfstore', 'value': 'fel03'}, {'name': 'iocName', 'owner': 'cfstore', 'value': 'iocGagePort'}, {'name': 'iocid', 'owner': 'cfstore', 'value': '10.0.0.15:34980'}, {'name': 'pvStatus', 'owner': 'cfstore', 'value': 'Active'}, {'name': 'time', 'owner': 'cfstore', 'value': '2024-04-23 01:15:22.941623'}, {'name': 'recceiverID', 'owner': 'cfstore', 'value': 'fel-recceiver'}, {'name': 'recordType', 'owner': 'cfstore', 'value': 'ai'}], 'tags': []}
2024-04-23T01:15:22+0200 [-] DEBUG:recceiver.cfstore Add existing channel with different IOC: {'name': 'FHIFEL:GagePort:Port2_count', 'owner': 'admin', 'properties': [{'name': 'hostName', 'owner': 'cfstore', 'value': 'fel03'}, {'name': 'iocName', 'owner': 'cfstore', 'value': 'iocGagePort'}, {'name': 'iocid', 'owner': 'cfstore', 'value': '10.0.0.15:34980'}, {'name': 'pvStatus', 'owner': 'cfstore', 'value': 'Active'}, {'name': 'time', 'owner': 'cfstore', 'value': '2024-04-23 01:15:22.941623'}, {'name': 'recceiverID', 'owner': 'cfstore', 'value': 'fel-recceiver'}, {'name': 'recordType', 'owner': 'cfstore', 'value': 'ai'}], 'tags': []}
2024-04-23T01:15:22+0200 [-] INFO:recceiver.cfstore Total channels to update: 4

what I understand that happens is:

my question now is: is it intended that the time, and pvStatus properties never update? why? I noticed that the debug log states "Polling begins", but never "polling complete", or "update channel with same IOC". shouldnt the time and pvStatus be regularly updated as some kind of heartbeat, and put a channel "inactive", when a client disconnects / disappears?

For example, I have a couple of channels in my channelfinder marked as active, but they are offline right now. You cant tell from the channelfinder and would assume they are usable.

I need some clarification :D

thanks in advance, William

jacomago commented 4 months ago

Hi William,

Yes, pvStatus is the status the last time the channel changed. The heartbeat (ping pong) is to update the status of the IOC to Inactive if the IOC goes down. I don't know if we ever want to change this @shroffk ?

If you have channels that are marked as active but they are down, then a possible situation is the ChannelFinder and/or recceiver was down when the ioc went down. You can use the cleanOnStart configuration in the recceiver to mark all channels Inactive on boot to avoid this problem. Otherwise you've likely found a bug :) If you can reproduce please let us know!

I hope that helps! Sky

kingspride commented 4 months ago

hm, okay. I'll rewrite my custom webinterface / list to make clear its "last changed".

also, I have weird behavior on my channelfinder itself, as I just noticed... It says it has 24753 channels in total, but when I try to get any channel after the 10,000th, I get an Error 500 and "all shards failed" in the log.

so far, channelfinder and all its surroundings are not very pleasant to maintain or use, unfortunately. :( I appreciate the idea and concept but somehow a lot can go wrong and make stuff kind of unreliable... sorry :)

jacomago commented 4 months ago

@kingspride Don't worry I agree. We've had some long debug sessions of recsync up to ChannelFinder, it can get confusing. More help in making it more stable is always appreciated.

The 10 000 is a limit on the Elasticsearch side, but I'm surprised by the error. Can you give me more details on the problem in the ChannelFinderService repo?

kingspride commented 4 months ago

sure. see you on the other side :)

https://github.com/ChannelFinder/ChannelFinderService/issues/139

shroffk commented 4 months ago

is it intended that the time, and pvStatus properties never update? why?

The time is meant to reflect the last change on the IOC side, it should match when the IOC was last started. The pvStatus should update when the connection between the IOC is lost... usually when IOC's go down it might be due to broader n/w issues where the recceiver also goes down, in these situations you might have Channels marked active when the IOC is not. The hope is that when everything recovers then the data in CF will be corrected once the recceiver+IOC connections are re-established.

CF is trying to build central view over the distributed nature of EPICS which might have edge cases.

We are working on a newer version of recciever and contributions are welcome

kingspride commented 4 months ago

thanks!

@jacomago I see you are working on fixing the cleaning operation since its affected by the 10k limit aswell. Otherwise, I would have reported that here...

You can close the issue if you want, my questions are depleted :)