Open rerpha opened 1 month ago
This has been narrowed down to a beckhoff problem rather than a reflectometry server problem (sample_y
is a beckhoff virtual axis on INTER). @rerpha and I will draft an email to motion team tomorrow about it with some diagnostics we've been able to collect.
Specifically, to reproduce:
sample_y
(or another virtual axis such as measurement_angle
) to 20
caput MOT:MTR0802 19 && caput MOT:MTR0802 18
(assuming MOT:MTR0802
is sample_y
)
18
isn't actually sent to the beckhoff immediately, rather it is sent soon after the initial move to 19 completes. Delay is approx 70ms
. Therefore I think the problem here is that the beckhoff is reporting that the initial move is "complete" too soon - before it is actually ready to accept a subsequent move command.Notes to self if we want to try to hide/compensate for this problem in the IBEX side rather than fixing it on the beckhoff end:
.DLY
field of motor record - see if this helps? Unclear whether this delay would get added between two moves caused by retargeting, or whether it will only get added right at the end.we narrowed this down to not being refl server related.
the issue seems to be two SPs in quick succession to the beckhoff virtual axes, which seem to get registered as SPs but never executed. we have been testing this without epics entirely to prove it with scripts like this:
import pyads
import time
def move(plc, pos, axis, velocity):
# does exactly what the motor record does for an absolute move
plc.write_by_name(f"GVL.astAxes[{axis}].stControl.fVelocity", velocity)
plc.write_by_name(f"GVL.astAxes[{axis}].stControl.fPosition", pos)
plc.write_by_name(f"GVL.astAxes[{axis}].stControl.eCommand", 0, pyads.PLCTYPE_INT)
plc.write_by_name(f"GVL.astAxes[{axis}].stControl.bExecute", 1)
# connect to plc and open connection
def poll_instead_of_sleeping(plc, axis, t):
start = time.time()
n = 0
while time.time() < start + t:
plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.bMoving")
n += 1
print(f"spammed {n} bmobings")
PRE_DELAY = 0.5
def waitfor_move(plc, axis, post_delay):
start = time.time()
time.sleep(PRE_DELAY)
while True:
time.sleep(0.2)
is_moving = plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.bMoving")
pos = plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.fActPosition")
sp = plc.read_by_name(f"GVL.astAxes[{axis}].stControl.fPosition")
if is_moving != 1:
break
time.sleep(post_delay)
end = time.time()
print(f"waitfor_move time {end-start}, final pos {pos}, final sp {sp}")
return (pos, sp)
def do_test(post_delay):
plc = pyads.Connection('192.168.0.25.1.1', 852)
with plc:
axis = 26
plc.write_by_name(f"GVL.astAxes[{axis}].stControl.bReset", 1)
p = plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.fActPosition")
v = 2
print(f"start position is {p}")
#initial move
move(plc, 20, axis, v)
# wait a bit
#time.sleep(5)
# poll_instead_of_sleeping(plc, axis, 5)
(p, sp) = waitfor_move(plc, axis, post_delay)
if sp != 20.0:
print("IT DIDN'T TAKE THE STUPID SP")
if p < 19.99 or p > 20.01:
return False
# final_pos = plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.fActPosition")
# if final_pos < 19.99 or final_pos > 20.01:
# return False
# send three moves in succession
move(plc, 18, axis, v)
(p, sp) = waitfor_move(plc, axis, post_delay)
if sp != 18.0:
print("IT DIDN'T TAKE THE STUPID SP MK2")
if p < 17.99 or p > 18.01:
return False
# for _ in range(100):
# plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.bMoving")
error_num =plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.nErrorID")
print(f"errors are {error_num}")
move(plc, 16, axis, v)
(p, sp) = waitfor_move(plc, axis, post_delay)
final_pos = plc.read_by_name(f"GVL.astAxes[{axis}].stStatus.fActPosition")
final_sp = plc.read_by_name(f"GVL.astAxes[{axis}].stControl.fPosition")
print(f"final sp is {final_sp}, position is {final_pos}")
return 15.99 < final_pos < 16.01
def do_full_test():
m = {}
for post_delay in range(0, 5000, 2):
print(f"Testing post_delay {post_delay}")
n = 0
passed = 0
for it in range(50):
print(it)
good = do_test(post_delay/1000.0)
if good:
passed += 1
n += 1
print(f"{100*(passed/n)}% good")
m[post_delay] = 100*(passed/n)
print(m)
do_full_test()
which produced a plot like this:
which suggests that a 200ms delay after a SP sent seems to be the sweet spot for not having moves ignored.
some of this may be caused by polling the beckhoff which slows things down - #7618 helped thisby reducing the poll rate from 1ms to 200ms so this may help further ( though can't easily be patched onto the beckhoff instruments )
A settle time can be added to a move from the motor record by setting the .DLY field. This script has been testing successful moves for this and this seems to stop it from happening when .DLY is set to 0.5 (500ms):
from genie_python import genie as g
MTR = "MOT:MTR0802"
beckhoff_axis = 26
beckhoff_errors = 0
position_errors = []
success = 0
unknown_errors = 0
g.toggle.exceptions_raised(True)
def reset_all_beckhoff_errors():
for i in range (1, 30+1):
g.set_pv(f"TC_01:ASTAXES_{i}:STCONTROL-BRESET", 1, is_local=True)
g.set_pv(f"TC_01:ASTAXES_{i}:STCONTROL-BENABLE", 1, is_local=True)
def move_and_check():
global beckhoff_errors, success
# clear beckhoff errors
reset_all_beckhoff_errors()
# move to the initial position
g.set_pv(MTR + ".VAL", 30, is_local=True)
g.waitfor_move()
err = g.get_pv(f"TC_01:ASTAXES_{beckhoff_axis}:STSTATUS-BERROR", is_local=True)
if err != 0:
beckhoff_errors += 1
return
pos = g.get_pv(MTR + ".RBV", is_local=True)
assert 29.99 < pos < 30.01, f"initial: {pos}"
g.set_pv(MTR + ".VAL", 19, is_local=True)
import time
time.sleep(1)
g.set_pv(MTR + ".VAL", 18, is_local=True)
g.waitfor_move()
err = g.get_pv(f"TC_01:ASTAXES_{beckhoff_axis}:STSTATUS-BERROR", is_local=True)
if err != 0:
beckhoff_errors += 1
return
pos = g.get_pv(MTR + ".RBV", is_local=True)
assert 17.99 < pos < 18.01, f"final: {pos}"
success += 1
def test_race():
global success
global unknown_errors
global beckhoff_errors
while (True):
try:
move_and_check()
except AssertionError as e:
position_errors.append(e)
except KeyboardInterrupt:
break
except Exception as e:
print(f"unexpected exception: {e.__class__.__name__}: {str(e)}")
unknown_errors += 1
print(f"num position errors: {len(position_errors)}, num unknown errors: {unknown_errors}, success = {success} beckhoff_errors = {beckhoff_errors}")
print(position_errors)
we are now testing this for less settle time (200ms) to reduce waitfor delays etc. without losing robustness. Edit: 200ms is fine as well - we'll use this for all beckhoff axes going forwards.
INTER emailed to ask for some testing time with real hardware.
we may be able to change https://github.com/ISISComputingGroup/EPICS-motor/blob/master/motorApp/Db/basic_asyn_motor.db if not used elsewhere other than TC and just pass in DLY of 0.2 in https://github.com/ISISComputingGroup/EPICS-ioc/blob/master/TC/TC-IOC-01App/Db/single_axis.substitutions
when:
putting SM IN, then immediately changing sm_angle (or within ~5s), then sample_y not updating based on the second sm_angle - it seems to use the first angle when setting SM IN instead of the new one. this isn't consistent and sometimes works OK. I think it's an issue within the reflectometry server where we aren't syncing the beam path frequently enough.
acceptance criteria
notes
we can simulate an INTER beamline either by simulating the whole thing through a twincat instance or just by using galils in RECSIM. this may help to solve the issue. If not it is somewhat reproducible on the real INTER.
Discussed ~1:36 20/06/24