Closed blondejamtart closed 4 years ago
@thomascobb , I'd be interested in your ideas for this. One option would be for caget
/caput
to always wait for reconnection ... but that would be a possibly incompatible change.
Edit: It looks like this is exactly what it does now, so whatever Brian is seeing is something else.
@btester271828 , I'm not able to reproduce this in a simple way. In my simple test, I do:
It looks like cothread correctly waits for the PV to reconnect, timing out if this doesn't happen in time. Can you please give a small self contained demonstration of this problem?
Edit: It's possible that what we're seeing here is a race condition, with the PV disconnecting between caget
waiting for the connection to complete and interrogating the the channel for its underlying data type. If so, I doubth this isn't fixable
The specific case I had was:
I think as long as it reconnected with the timeout then that would be fine...
The following python script reproduces the error (will need tweaks for importing cothread/numpy):
import sys
import os
import subprocess
import time
from pkg_resources import require
require("numpy")
sys.path.append("/scratch/myr45768/Git/cothread")
from cothread import catools
# epics_base = '/scratch/myr45768/Git/epics-base'
epics_base = '/dls_sw/epics/R3.14.12.7/base'
softIoc_bin = epics_base + "/bin/linux-x86_64/softIoc"
# load some pre-existing template & define macros for it
db_template = '/dls/technical/controls/myr45768/pymalcolm/malcolm/modules/system/db/system.template'
stats = dict()
sys_call_bytes = open('/proc/%s/cmdline' % os.getpid(), 'rb').read().split(b'\0')
sys_call = [el.decode("utf-8") for el in sys_call_bytes]
stats["pymalcolm_path"] = os.path.abspath(sys_call[1])
stats["yaml_path"] = os.path.abspath(sys_call[2])
stats["yaml_ver"] = "bugMaker"
stats["pymalcolm_ver"] = "not pymalcolm"
hostname = os.uname()[1]
stats["kernel"] = "%s %s" % (os.uname()[0], os.uname()[2])
stats["hostname"] = hostname if len(hostname) < 39 else hostname[:35] + '...'
pid = os.getpid()
stats["pid"] = pid
simultaneous = 10
iocs = []
db_macros = []
for i in range(simultaneous):
iocs += [None]
db_macros += [None]
for i in range(len(iocs)):
db_macros[i] = "prefix='pc0111-BUG-R01-%02d'" % (i + 1)
for key, value in stats.items():
db_macros[i] += ",%s='%s'" % (key, value)
# done defining db template, launch some IOCs!
for repeats in range(100):
print("Iteration %d" % repeats)
for i in range(len(iocs)):
iocs[i] = subprocess.Popen(
softIoc_bin + " -m " + db_macros[i] + " -d " + db_template + " 2> err",
stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True)
time.sleep(0.5)
errored = False
for i in range(len(iocs)):
val = catools.caget('pc0111-BUG-R01-%02d:PID' % (i + 1))
for ioc in iocs:
ioc.terminate()
time.sleep(0.5)
This looks like a race condition between checking whether the channel is connected and actually using it. With asynchronous CA handling (see commit 897a29fdd4fa0560b63e135139d86ae69557d7d3), avoiding this is fundamentally impossible.
Closing this as cannot sensibly fix.
There doesn't seem to be any way to wait for the channel to reconnect.