epics-extensions / ca-gateway

Channel Access PV Gateway
http://www.aps.anl.gov/epics/extensions/gateway/
Other
19 stars 18 forks source link

CTRL and GR subscriptions get incorrect data #11

Closed ralphlange closed 5 years ago

ralphlange commented 8 years ago

This problem was first reported at pyepics/pyepics#40 There are some test results at the above link as well.

It had an intermediate life as an EPICS Base bug at https://bugs.launchpad.net/epics-base/+bug/1510955 - that ticket has some more thoughts.

The Original Problem:

DBR_CTRL variables (units, etc) are reported correctly only on the initial connection for a channel subscription when the channel is hosted by either a Gateway or via pcaspy. This hints at a problem in libcas or libgdd.

Test code may be found here - https://github.com/dchabot/ca_client_test

This is simply a version of the caClient example code genereated via makeBaseApp, and modified to print the units of a channel.

To test, point the caMonitor executable at channel hosted by a Gateway or hosted by pcaspy.

ralphlange commented 8 years ago

c6f69c7babe0337ff2cd0b14a290f5b674433fb0 adds a test showing the bug.

ralphlange commented 8 years ago

From https://bugs.launchpad.net/epics-base/+bug/1510955

Other than my first statements, I have come to the conclusion that the problem should not be fixed in CAS or GDD, but in the server applications.

CAS itself does no caching of data that is posted by the server app. It keeps a prototype GDD of the structure that the client requested, and whatever GDD the server app posts is smart-copied into a clone of the prototype and pushed to the client.

read() operations give the prototype clone (e.g. a ctrl container) to the server app to fill in. That's why the first update on setting up the connection is correct: in that case CAS does a read(), and the server app fills the complete structure. If the server app then posts e.g. value/time/status GDDs, the smart-copy to the prototype clone leaves all other elements empty - the DBR_CTRL structure only gets the posted elements, the remaining are zero or empty.

The server app has no hint if a CAS issued read() is originating from a client-side read or a subscription request.

Adding a merging cache inside CAS would add a lot of complexity and threaten the stability of CAS. It might be easier to have the server app do whatever is needed, and have it always post complete containers (super set of DBR types), so that CAS's smart-copy in any prototype clone GDD will always fill the requested structure completely.

georgweiss commented 5 years ago

At ESS this has triggered complaints. Initially CS Studio was blamed. Does this bug have any attention at all?

krisztianloki commented 5 years ago

I've been trying to understand the code and fix this issue for quite some time now but I think I have a solution that at the very least should fix the bug for CS Studio. Can you @ralphlange please take a look at it? Thanks!