Closed jilavsky closed 2 years ago
ANother beam dump - we got to test this really well today. This also does not look good for continuing with data collection:
I Sat-09:35:32 - sample image file: /share1/USAXS_data/2021-10/10_16_Pyush2/10_16_Pyush2_usaxs/Pyush_Aged_1pct_top_0115.jpg
Transient Scan ID: 27 Time: 2021-10-16 09:35:32
Persistent Unique Scan ID: 'c875114d-fdf7-4ca4-a902-c0709e1ecbce'
New stream: 'baseline'
New stream: 'aps_current_monitor'
I Sat-09:35:32 - HDF5 config: /share1/AreaDetectorConfig/FlyScan_config/saveFlyData.xml
I Sat-09:35:32 - HDF5 output: /share1/USAXS_data/2021-10/10_16_Pyush2/10_16_Pyush2_usaxs/Pyush_Aged_1pct_top_0115.h5
I Sat-09:35:32 - truncating long status message: FlyScanning: Pyush_Aged_1pct_top_0115.h5
D Sat-09:35:32 - progress_reporting has arrived
I Sat-09:35:32 - flying, s ar, deg ay, mm dy, mm channel elapsed, s
D Sat-09:35:37 - 5.00 8.7429294 0.00713 13.15394 0 0.00
D Sat-09:35:42 - 10.01 8.7417444 0.00207 13.13393 42 3.68
D Sat-09:35:47 - 15.01 8.7407596 -0.00180 13.11860 79 8.70
D Sat-09:35:52 - 20.02 8.7378478 -0.01404 13.06992 178 13.70
D Sat-09:35:57 - 25.02 8.7304743 -0.04407 12.94925 378 18.70
D Sat-09:36:02 - 30.03 8.7172640 -0.09842 12.73538 670 23.70
D Sat-09:36:07 - 35.03 8.6959354 -0.18631 12.38989 1038 28.72
D Sat-09:36:12 - 40.03 8.6657630 -0.31082 11.89987 1463 33.72
D Sat-09:36:17 - 45.04 8.6237471 -0.47570 11.24097 1930 38.72
D Sat-09:36:22 - 50.05 8.5707036 -0.69560 10.36091 2431 43.72
D Sat-09:36:27 - 55.05 8.5028882 -0.97277 9.28952 2958 48.72
D Sat-09:36:32 - 60.05 8.4191995 -1.30581 7.95428 3504 53.72
D Sat-09:36:37 - 65.06 8.3188559 -1.71730 6.31367 4069 58.75
D Sat-09:36:42 - 70.06 8.1995525 -2.19109 4.44965 4640 63.72
Suspending....To get prompt hit Ctrl-C twice to pause.
Suspension occurred at 2021-10-16 09:36:42.
Justification for this suspension:
Signal usaxs_CheckBeamStandard is low
Suspending....To get prompt hit Ctrl-C twice to pause.
Suspension occurred at 2021-10-16 09:36:43.
Justification for this suspension:
Signal white_beam_ready_available is low
D Sat-09:36:47 - 75.07 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:36:52 - 80.08 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:36:57 - 85.09 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:02 - 90.09 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:07 - 95.10 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:12 - 100.10 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:17 - 105.11 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:22 - 110.12 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:27 - 115.13 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:32 - 120.13 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:37 - 125.14 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:42 - 130.14 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:47 - 135.15 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:52 - 140.15 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:37:57 - 145.16 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:02 - 150.17 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:07 - 155.17 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:12 - 160.17 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:17 - 165.17 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:22 - 170.17 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:27 - 175.18 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:32 - 180.18 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:37 - 185.18 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:42 - 190.19 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:47 - 195.19 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:52 - 200.20 8.1844312 -2.23394 4.28169 4701 65.53
D Sat-09:38:57 - 205.21 8.1844312 -2.23394 4.28169 4701 65.53
I Sat-09:39:02 - 210.02 8.1844312 -2.23394 4.28169 4701 65.53
E Sat-09:39:02 - 210.02007913589478s - progress_reporting timeout!!
This is really weird - why did the stages stop moving? Ar, Dy, and Ay both are standing still after suspension. BUT, this is epics scan, BS should not be controlling position of these stages at this time. These stages are following epcis defined trajectory and no control software should try to control them until they report done. No surprise this will fail. Expectation: When beam dump happens during Flyscan, scan is finished, data are collected and BS will pause before trying to collect next data set. But everything before we try to collect next data set should be finished, all the "decorations", cleanup etc. At this moment instrument is in unknown state at weird positions and recovery can be difficult.
Conclusion: we may need to implement our own suspender if we cannot improve on existing one.
Marking high priority as this will be complicated to fix, but with current APS operations this is critically needed.
TODO: Suspender needs to enable feedback for the100sec recovery and then set back to its prior value.
NOTE - suspender seems smarter than expected. After beam came back, the flyscan was re run and data were collected. And they look good. This is surprising that suspender will retry to data collection??? Weird note: dy controller had error after suspender recovered. I had to reenable it manually, even though it might be reset by BS at some point also. Not sure what caused the controller error in first place. I disabled an enabled momentarily the motor and error went away.
That's how it should work. There is a yield from bps.checkpoint() between scans. RE rewinds after resume to the last checkpoint.
On Sat, Oct 16, 2021, 10:19 AM Jan Ilavsky @.***> wrote:
NOTE - suspender seems smarter than expected. After beam came back, the flyscan was re run and data were collected. And they look good. This is surprising that suspender will retry to data collection??? Weird note: dy controller had error after suspender recovered. I had to reenable it manually, even though it might be reset by BS at some point also. Not sure what caused the controller error in first place. I disabled an enabled momentarily the motor and error went away.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/APS-USAXS/ipython-usaxs/issues/523#issuecomment-944931800, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMFSX73UDB5EOHFFRHTUHGJZHANCNFSM5GDWN3SA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Well, that is actually smarter than I expected and well designed. I stand corrected. We are back to only the first issue - why did the first time fail and stopped running the scans and can we prevent that next time.
The RunEngine will stop any positioners it is running. This is a design. I need to look up the exact rule on this. If the RE was controlling the motor for a scan, then its stop()
will be called.
Questions I have:
stop()
called, exactly? # During suspend, all motors should be stopped. Call stop() on
# every object we ever set().
self._stop_movable_objects(success=True)
Here's the RE's _stop_movable_objects()
code:
def _stop_movable_objects(self, *, success=True):
"Call obj.stop() for all objects we have moved. Log any exceptions."
for obj in self._movable_objs_touched:
try:
stop = obj.stop
except AttributeError:
self.log.debug("No 'stop' method available on %r", obj)
else:
try:
stop(success=success)
except Exception:
self.log.exception("Failed to stop %r.", obj)
So, we need an answer about item 2 above.
The RE's rewind feature (via checkpoints) is a principle reason that we use bluesky plans (Python's generator functions and the yield from plan()
syntax) rather than straight Python functions.
So it is not just any movable object, but rather the membership in self._movable_objs_touched
(where self
refers to the RE
).
A movable is added to that list when the RE receives a MSG("set", movable, position)
message. Seems inevitable.
async def _set(self, msg):
"""
Set a device and cache the returned status object.
Also, note that the device has been touched so it can be stopped upon
exit.
Expected message object is
Msg('set', obj, *args, **kwargs)
where arguments are passed through to `obj.set(*args, **kwargs)`.
"""
kwargs = dict(msg.kwargs)
group = kwargs.pop('group', None)
self._movable_objs_touched.add(msg.obj)
...
Time to scratch our heads and scheme.
The list is cleared (same file) during this code:
def _clear_call_cache(self):
"Clean up for a new __call__ (which may encompass multiple runs)."
self._metadata_per_call.clear()
self._staged.clear()
self._objs_seen.clear()
self._movable_objs_touched.clear()
...
and that code (_clear_call_cache()
) is called when a plan is executed. That is, when calling RE(plan())
from the command line.
At the risk of shooting ourselves in the foot (note the method starts with _
which, by convention, means it is internal and should not be called by others), we could check that list for certain movables and remove them. Since it is a Python set
(not a list that can be edited in place), we'd need to change that with a revised set without the certain movables. Something like this:
RE._movable_objs_touched = set(the movables in `RE._movable_objs_touched` without ar, ay, and dy)
We had beam dump which caught likely BS at bad moment and this ended all runs:
this also sent this e-mail:
Looks to me that may be we have issue with suspendor suspending at wrong time?