Closed ambarb closed 6 years ago
In this case, the progress bar is watching the ArrayCounter PV. I'm not sure how to make that more robust.
Can you clarify what you meant by, "But noticed that when I made the number of images per point = 1, that there were a lot of dropped points"?
@danielballan RE: "But noticed that when I made the number of images per point = 1, that there were a lot of dropped points"? This was because the DONE TRIGGERING thing happened immediately. See https://github.com/NSLS-II/Bug-Reports/issues/178
The original occurrence at the top happened two more times. On these occasiotins we did not need to restart bluesky.
Here is the plan we are running:
def single_ph_ct(sets):
yield from ssh_close()
yield from bp.sleep(2)
yield from count(dets)
olog('ScanNo{} Darks at {:.2f} eV'.format(db[-1].start['scan_id'], pgm_en.readback.value))
yield from ssh_open()
yield from bp.sleep(2)
yield from count(dets,num=sets)
olog('ScanNo{} Lights at {:.2f} eV'.format(db[-1].start['scan_id'], pgm_en.readback.value))
yield from ssh_close()
yield from bp.sleep(2)
yield from count(dets)
olog('ScanNo{} Darks at {:.2f} eV'.format(db[-1].start['scan_id'], pgm_en.readback.value))
Here is one example of the stopped scan (with no keyboard interupt)
In [37]:
In [37]:
In [37]: RE(single_ph_ct(20))
Transient Scan ID: 83014 @ 2017/09/10 18:43:47
Persistent Unique Scan ID: 'dc3b9b8d-2aec-42f1-9b95-ade00e042678'
+-----------+------------+-------------------+███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [03:20<00:00, 24.99images/s]
| seq_num | time | fccd_stats4_total |
+-----------+------------+-------------------+
| 1 | 18:47:07.9 | 81754 |
+-----------+------------+-------------------+
generator count ['dc3b9b'] (scan num: 83014)
Transient Scan ID: 83015 @ 2017/09/10 18:47:11
Persistent Unique Scan ID: '77d2332f-8ab2-4208-a0e6-29ec85f4690c'
+-----------+------------+-------------------+
| seq_num | time | fccd_stats4_total |
+-----------+------------+-------------------+
| 1 | 18:50:32.5 | 222849 |
| 2 | 18:53:52.6 | 273249 |
| 3 | 18:57:12.7 | 176688 |
| 4 | 19:00:32.7 | 208191 |
| 5 | 19:03:53.0 | 255491 |
fcA 'deferred pause' has been requested. The RunEngine will pause at the next checkpoint. To pause immediately, hit Ctrl+C again in the next 10 seconds.█████ | 4872/5000 [03:14<00:05, 25.00images/s]
Deferred pause acknowledged. Continuing to checkpoint.
^C
Your RunEngine is entering a paused state. These are your options for changing
the state of the RunEngine:
RE.resume() Resume the plan.
RE.abort() Perform cleanup, then kill plan. Mark exit_stats='aborted'.
RE.stop() Perform cleanup, then kill plan. Mark exit_status='success'.
RE.halt() Emergency Stop: Do not perform cleanup --- just stop.
Pausing...
Out[37]:
['dc3b9b8d-2aec-42f1-9b95-ade00e042678',
'77d2332f-8ab2-4208-a0e6-29ec85f4690c']
In [38]: RE.stop()
Stopping: running cleanup and marking exit_status as 'success'...
+-----------+------------+-------------------+
generator count ['77d233'] (scan num: 83015)
Both PVs monitored seperately indicated that the camera was stuck in an acquire mode: XF:23ID1-ES{FCCD}HDF1:NumCaptured_RBV XF:23ID1-ES{FCCD}cam1:ArrayCounter_RBV
Will try to capture graphically again if this happens. The attached plot of the image PVs show that the camera actually stopped. @stuwilkins do you think there is something on our side?
@danielballan The root cause is likely something to do with our detector. However, if there is a timeout on motors, why is there not a timeout on detectors if they are not updating? During this failure mode, the FCCD is essentially frozen until we ^C^C
and abort or stop. After which, the detector behaves as expected. If we were to exit the RE
because of a time out, I assume that the detector would be allowed to self-trigger again (as if aborting or stopping). Am i correct?
Note for anyone following along: we resolved this locally. Also see https://github.com/NSLS-II/ophyd/pull/444
Scanning the detector at 0.1s exposure with total acquire period of 0.3. 20 images per point
After the 3rd point, the data collection paused at 20%. After aborting, pressing enter gives multiple lines of the same (or similiar) errors. FCCD when back to normal state after run aborted.
Had to quit bsui.
On restarting, could not reproduce the error. But noticed that when I made the number of images per point = 1, that there were a lot of dropped points. This is okay. But is a big clue to the issue reported below. Maybe bluesky lost count at 20% for point 3 and then could never recover. Is there a way to make this progress bar more robust? Below is not highly reproducible, but it is certainly not a one off.