APS-USAXS / usaxs-bluesky-ended-2023

Bluesky instrument for USAXS
0 stars 0 forks source link

Exception : state PV not available #577

Open jilavsky opened 2 years ago

jilavsky commented 2 years ago

After APS beam dump we had failure in access to 9idcLAX:state PV

|        72 | 20:32:51.9 |    1666667 |      24253 |        797 |                                                                                                          
|        73 | 20:32:52.3 |    1666667 |      24728 |        721 |                                                                                                          
|        74 | 20:32:52.8 |    1666667 |      25820 |        671 |                                                                                                          
+-----------+------------+------------+------------+------------+      
 
generator uascan ['f9b3d524'] (scan num: 96)                                                                                                                               
E Wed-20:32:54 - Exception FailedStatus during attempt 1 of 1 of command '('USAXSscan', ['19', '40', '1', 'AirBlank_R'], 76, '      USAXSscan        19      40      1      "AirBlank_R"')''                                                                                                                                                           
subject: FailedStatus during attempt 1 of 1 of command '('USAXSscan', ['19', '40', '1', 'AirBlank_R'], 76, '      USAXSscan        19      40      1      "AirBlank_R"')''
 
date: 2022-11-16 20:32:54.723736
command file: /share1/USAXS_data/2022-11/usaxs.mac
line number: 76
command: ('USAXSscan', ['19', '40', '1', 'AirBlank_R'], 76, '      USAXSscan        19      40      1      "AirBlank_R"')
raw command:       USAXSscan        19      40      1      "AirBlank_R"
attempt: 1 of 1
exception: Status(obj=EpicsSignal(read_pv='9idcLAX:state', name='user_data_state', parent='user_data', value='counting 75/300', timestamp=1668652373.179653, auto_monitor=True, string=True, write_pv='9idcLAX:state', limits=False, put_complete=False), done=True, success=False)
Stopping further processing of this command list.

Not sure this has anything to do with APS beam dump, but happened after suspender properly suspended operations and recovered. Disappointing, all worked before to die this way. Beam dump was around point 45, so we collected another 30 points here.

jilavsky commented 2 years ago

Observation - we have this state PV not responding routinely, but typically it is non-destructive see below. WHy was the one above destructive Exception?

       186 | 09:21:52.5 |    5000000 |     113382 |      16934 |                                                                                                          
W Fri-09:21:52 - Exception while reporting instrument state: Status(obj=EpicsSignal(read_pv='9idcLAX:state', name='user_data_state', parent='user_data', value='counting 187/300', timestamp=1668784912.858422, auto_monitor=True, string=True, write_pv='9idcLAX:state', limits=False, put_complete=False), done=True, success=False)
|       187 | 09:21:53.4 |    5000000 |     113632 |      16272 |                                                                                                          
|       188 | 09:21:54.2 |    5000000 |     113659 |      15097 |                                                                                                          
|       189 | 09:21:54.9 |    5000000 |     113672 |      14403 |       
prjemian commented 2 years ago

What does this mean?

non-destructive

I'm thinking it may be necessary to avoid using a Status object when setting this PV. To do that, the .put(message) must be launched in a thread (we do not really care about interrupting the main process for this one).

jilavsky commented 2 years ago

Launching in thread is a solution, if it does not hang the main data collection thread. The BlackFly in #573 seems to hand main data collection thread and aborting simply resumes the data collection. So I am bit confused on the thread dependence. Reading again the original report, realized that there are two Exceptions : First :

E Wed-20:32:54 - Exception FailedStatus during attempt 1 of 1 of command '('USAXSscan', ['19', '40', '1', 'AirBlank_R'], 76, '      USAXSscan        19      40      1      "AirBlank_R"')''                                                                                                                                                           
subject: FailedStatus during attempt 1 of 1 of command '('USAXSscan', ['19', '40', '1', 'AirBlank_R'], 76, '      USAXSscan        19      40      1      "AirBlank_R"')''

Second

exception: Status(obj=EpicsSignal(read_pv='9idcLAX:state', name='user_data_state', parent='user_data', value='counting 75/300', timestamp=1668652373.179653, auto_monitor=True, string=True, write_pv='9idcLAX:state', limits=False, put_complete=False), done=True, success=False)
Stopping further processing of this command list.

I cannot figure out which one is primary or which may be secondary - or simply why did the data collection failed here? As noted, "Exception while reporting instrument state:" does not seem to stop data collection, it seems to be ignored. I see it during data collection relatively routinely and it does not do anything bad, simply generates message. So what was different here and how can we prevent that.

prjemian commented 2 years ago

As it turns out, there is a plan method intended to avoid exactly this problem (send the text to the PV but do not return a Status object): https://github.com/APS-USAXS/usaxs-bluesky/blob/a803031b8c806c74949ff53fab1ef4cc3cb2f678/instrument/devices/user_data.py#L104-L113

We call this from many places:

usaxs@usaxscontrol ~/bluesky $ git grep set_state_plan
instrument/devices/usaxs_fly_scan.py:                yield from user_data.set_state_plan("writing fly scan HDF5 file")
instrument/devices/usaxs_fly_scan.py:        yield from user_data.set_state_plan("fly scan finished")
instrument/devices/user_data.py:    def set_state_plan(self, msg, confirm=True):
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan(
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan(
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Ready for BlackFly imaging mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Moving USAXS to USAXS mode")
instrument/plans/mode_changes.py:        yield from user_data.set_state_plan("USAXS Mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Moving USAXS to SAXS mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("SAXS Mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Moving USAXS to WAXS mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("WAXS Mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Radiography Mode")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Moving USAXS to Imaging mode (same as USAXS mode now)")
instrument/plans/mode_changes.py:    yield from user_data.set_state_plan("Moving USAXS to OpenBeamPath mode")
instrument/plans/mode_changes.py:        yield from user_data.set_state_plan("USAXS moved to OpenBeamPath mode")
instrument/plans/requested_stop.py:        yield from user_data.set_state_plan(msg)
instrument/plans/resets.py:    yield from user_data.set_state_plan("resetting motors")
instrument/plans/resets.py:    yield from user_data.set_state_plan("USAXS reset complete")
instrument/plans/sample_transmission.py:    yield from user_data.set_state_plan("Measure USAXS transmission")
instrument/plans/sample_transmission.py:    yield from user_data.set_state_plan("Measure SAXS transmission")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Moving to Q=0")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Running USAXS step scan")
instrument/plans/scans.py:    yield from user_data.set_state_plan("USAXS step scan finished")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Moving USAXS back and saving data")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Moving to Q=0")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Running Flyscan")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Flyscan finished")
instrument/plans/scans.py:    yield from user_data.set_state_plan("Moving USAXS back and saving data")
instrument/plans/uascan.py:            yield from user_data.set_state_plan(f"moving motors {i+1}/{intervals}")
instrument/plans/uascan.py:            yield from user_data.set_state_plan(f"counting {i+1}/{intervals}")
instrument/plans/uascan.py:        yield from user_data.set_state_plan("returning AR, AX, SY, and DX")

but there are more cases (slightly more tedious search) where it appears we do not use this plan to set the user_data.state signal:

usaxs@usaxscontrol ~/bluesky $ git grep state | grep -v set_state_plan | grep user_data.state 
instrument/devices/usaxs_fly_scan.py:                user_data.state._set_thread = None
instrument/plans/command_list.py:        user_data.state, "Starting data collection",
instrument/plans/command_list.py:        user_data.state, "USAXS macro file done",  # exact text triggers the music
instrument/plans/requested_stop.py:            user_data.state, "Aborted data collection",
instrument/plans/scans.py:        user_data.state, "pre-USAXS optics tune",
instrument/plans/scans.py:        user_data.state,            "pre-USAXS optics tuning done",
instrument/plans/scans.py:        user_data.state, "pre-USAXS optics tune",
instrument/plans/scans.py:        user_data.state,            "pre-USAXS optics tuning done",
instrument/plans/scans.py:    #     user_data.state, "pre-SWAXS optics tune",
instrument/plans/scans.py:    #     user_data.state,            "pre-SWAXS optics tuning done",
instrument/plans/scans.py:        user_data.state, "starting USAXS step scan",
instrument/plans/scans.py:        user_data.state, "starting USAXS Flyscan",
instrument/plans/scans.py:        user_data.state, "starting SAXS collection",
instrument/plans/scans.py:            user_data.state, f"SAXS collection for {terms.SAXS.acquire_time.get()} s",
instrument/plans/scans.py:        user_data.state, "Done SAXS",
instrument/plans/scans.py:        user_data.state, "starting WAXS collection",
instrument/plans/scans.py:            user_data.state,
instrument/plans/scans.py:        user_data.state, "Done WAXS",
instrument/plans/tune_guard_slits.py:        user_data.state, "tune Guard slits center",
instrument/plans/tune_guard_slits.py:        user_data.state, "autoranging the PD",
instrument/plans/tune_guard_slits.py:    yield from bps.mv(user_data.state, title)
instrument/plans/tune_guard_slits.py:    yield from bps.mv(user_data.state, "autoranging the PD")
prjemian commented 2 years ago

We can probably avoid this problem in the future by examining each of those files and make certain to replace any use of:

yield from bps.mv(user_data.state, some_text)

with

yield from user_data.set_state_plan(some_text)

This might involve separating the posting of a new state message from other signals in the same call to bps.mv() such as this block: https://github.com/APS-USAXS/usaxs-bluesky/blob/a803031b8c806c74949ff53fab1ef4cc3cb2f678/instrument/plans/command_list.py#L118-L122

jilavsky commented 2 years ago

Great, so this os basically oversight on our side - we already implemented patch for this epics-BS communication bug (on this specific PV).
Sounds like I need to correct all of these. Looks like simple monkey work. I think this is simple search, separate if needed, and replace, no additional imports needed. Correct?

prjemian commented 2 years ago

Hmmm... looking into the actual line of code for the exception reported above: https://github.com/APS-USAXS/usaxs-bluesky/blob/a803031b8c806c74949ff53fab1ef4cc3cb2f678/instrument/plans/uascan.py#L222

Since the status object reports it was trying to use: value='counting 75/300', I'm not convinced we have a final answer yet. But it would be great for the code to become consistent, no matter.

prjemian commented 2 years ago

Observation - we have this state PV not responding routinely, but typically it is non-destructive see below. WHy was the one above destructive Exception?

Inconsistent use. In some cases, exception is caught and handled (so non-destructive) while in others, the exception bubbles up to the top and ends the plan.

prjemian commented 2 years ago

Summary, so far: Probably not solved.

Continue to make all state updates go through user_data.set_state_plan(). That gives me one place to implement a different handling.

This line might be the troublemaker: https://github.com/APS-USAXS/usaxs-bluesky/blob/a803031b8c806c74949ff53fab1ef4cc3cb2f678/instrument/devices/user_data.py#L108

Default value for confirm=True


A replacement might be like this:

 def set_state_plan(self, msg, confirm=True): 
     """plan: tell EPICS about what we are doing"""

     @apstools.util.run_in_thread
     def caput_state(msg):
          self.state.put(msg)

     yield from bps.null()  # makes this method a generator function (a bluesky "plan")
     caput_state(trim_string_for_EPICS(msg)) 
prjemian commented 2 years ago

note https://github.com/APS-USAXS/usaxs-bluesky/blob/a803031b8c806c74949ff53fab1ef4cc3cb2f678/instrument/devices/user_data.py#L19

so do the same for the run_in_thread decorator.

jilavsky commented 2 years ago

I converted all calls to yield from user_data.set_state_plan(some_text) method. Now we just need to make sure this does not bomb on us when we use it...