ISISComputingGroup / IBEX

Top level repository for IBEX stories
5 stars 2 forks source link

WISH ORC: jolting then offsetting #7116

Closed rerpha closed 2 years ago

rerpha commented 2 years ago

The WISH collimator is jolting and then offsetting where it oscillates which suggests either 0 is being set incorrectly or motor steps are being missed. Previously it was thought this could be caused by an incorrect home signal (and after a maintenance rotation) but we narrowed it down to a config change/ IOC restart. The actual db is ok, especially after #7025 however something lower-level is going on.

Acceptance Criteria

Extra Information

Things we tested:

After using our testbed Galil with a rotary dial on to simulate a collimator we found the issue after several days of narrowing down the issue. userdef_records8.db sets lots of user variables per axis for diagnostic/debugging purposes, but it does so all at once. I tested this by doing the same thing with a python script as follows:

from time import sleep
vars = ["K1A","K1B","K1C","K1D","K1E","K1F","K1G","K1H","K2A","K2B","K2C","K2D","K2E","K2F","K2G","K2H","K3A","K3B","K3C","K3D","K3E","K3F","K3G","K3H","ILA","ILB","ILC","ILD","ILE","ILF","ILG","ILH","FVA","FVB","FVC","FVD","FVE","FVF","FVG","FVH","FCA","FCB","FCC","FCD","FCE","FCF","FCG","FCH","FAA","FAB","FAC","FAD","FAE","FAF","FAG","FAH","FNA","FNB","FNC","FND","FNE","FNF","FNG","FNH","ZPA","ZPB","ZPC","ZPD","ZPE","ZPF","ZPG","ZPH","ZNA","ZNB","ZNC","ZND","ZNE","ZNF","ZNG","ZNH","TLA","TLB","TLC","TLD","TLE","TLF","TLG","TLH","CPA","CPB","CPC","CPD","CPE","CPF","CPG","CPH","CTA","CTB","CTC","CTD","CTE","CTF","CTG","CTH","AFA","AFB","AFC","AFD","AFE","AFF","AFG","AFH","DSA","DSB","DSC","DSD","DSE","DSF","DSG","DSH","DBA","DBB","DBC","DBD","DBE","DBF","DBG","DBH"]

for var in vars:
    g.set_pv("TE:NDW1836:MOT:DMC04:SEND_STR_CMD", f"{var}={random.randint(1,10)}")
    sleep(0.01)

i got the names of these vars by adding some print statements in the Galil driver and seeing what was being set on startup. There are several PINIs in the userdef_records db file which are responsible for this. Weirdly polling these variables and using MG _{var} doesn't cause the same issue!

This caused the jolting and then offset oscillation. Freddie's PR to disable setting these if unchanged fixes the issue as does disabling the userdef record db load. This is the fix, as it stops the setting of variables https://github.com/ISISComputingGroup/EPICS-galil/compare/check_uservar_val_change

I also tried setting the vars to 0 twice to simulate what was going on on my machine and this also caused the same behaviour - I think this is just related to var/operand sets if done too quickly. We wouldn't have seen this anywhere else as we don't have any constantly moving, position-critical axes on site.

There is also another issue where loading the controller code (or even just checking the code is unchanged) pauses the thread, which is responsible for the oscillation in this instance, even when a quiet start is requested. We don't really care too much as it's very unlikely WISH will be taking data AND changing config/restarting the IOC. After the code checking the thread happily resumes and swings the correct angle etc.

We should patch the change made in Freddie's PR over to WISH and make it permanent as it affects both the old and new Galil driver, though this may be more difficult to do in the old one if it's handled by the Galil DLL

TL,DR; setting loads of user vars on the Galil controller seems to cause a thread to have unexpected behaviour and jolt/miss motor steps.

rerpha commented 2 years ago

still to do:

rerpha commented 2 years ago

When we patch this over to WISH we could consider removing the homing routines for anything beyond axis 3/C as they make the pause during code checking longer. Setting them to empty quotes seems to shorten this time!

rerpha commented 2 years ago

old driver fix (not sure if this works yet, still need to test) - https://github.com/ISISComputingGroup/EPICS-galil/pull/65

rerpha commented 2 years ago

ORC db seems to be updating dist and vel every couple of seconds due to an RBV scan causing lots of copy processes (we could change this to CPP if the value hasn't changed?) though it doesn't seem to be causing any issues, and certainly not jolting

rerpha commented 2 years ago

Hooray, it's fixed! tested on the real device today, reproduced the jolting and offsetting, then patched over the new DLLL with user var checking and it now seems to be fine after several IOC restarts

rerpha commented 2 years ago

fix has been applied to galil-old but may still need to be merged from https://github.com/ISISComputingGroup/EPICS-galil/tree/check_uservar_val_change to the new driver

ThomasLohnert commented 2 years ago

Looks good, thanks for the thorough documentation of the issue & investigation!