APS-USAXS / usaxs-bluesky-ended-2023

Bluesky instrument for USAXS
0 stars 0 forks source link

Ctrl-C did not abort data collection #590

Open jilavsky opened 1 year ago

jilavsky commented 1 year ago

User confused, asking for help. Running one of data collection functions in run engine and when APS dumps the ring, user does ctrl-C. Then RE.abort() and data collection resumes - even without beam, which should not be possible. Calls staff, staff logs in remotely, does multiple ctrl-C to abort. After finishing WAXS data collection, eventually (4x ctrl-C?) run engine ends with coming back to prompt without need for RE.abort(). This is not reasonable, users need way to abort reliably and cleanly - and easily. I suspect the issue here was the suspender was aborted by first ctrl-C and killed by RE.abort(), but then for some reason system returned to data collection loop and kept going without suspending due to beam dump and ignoring user cries for help. Can we assign different ctrl-something to abort everything in run engine? Something obvious.

prjemian commented 1 year ago

I'm suspicious of jobs running in background threads. They may not get cancelled under those conditions. Too much to suggest a workaround of, exit and restart?

On Thu, Mar 30, 2023, 6:32 PM Jan Ilavsky @.***> wrote:

Assigned #590 https://github.com/APS-USAXS/usaxs-bluesky/issues/590 to @prjemian https://github.com/prjemian.

— Reply to this email directly, view it on GitHub https://github.com/APS-USAXS/usaxs-bluesky/issues/590#event-8892716341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMHRBSXDVAO725Q27WDW6YJXNANCNFSM6AAAAAAWN6C57U . You are receiving this because you were assigned.Message ID: @.***>

prjemian commented 1 year ago

Which specific data collection continued?

On Thu, Mar 30, 2023, 8:46 PM Pete Jemian @.***> wrote:

I'm suspicious of jobs running in background threads. They may not get cancelled under those conditions. Too much to suggest a workaround of, exit and restart?

On Thu, Mar 30, 2023, 6:32 PM Jan Ilavsky @.***> wrote:

Assigned #590 https://github.com/APS-USAXS/usaxs-bluesky/issues/590 to @prjemian https://github.com/prjemian.

— Reply to this email directly, view it on GitHub https://github.com/APS-USAXS/usaxs-bluesky/issues/590#event-8892716341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMHRBSXDVAO725Q27WDW6YJXNANCNFSM6AAAAAAWN6C57U . You are receiving this because you were assigned.Message ID: @.***>

jilavsky commented 1 year ago

When I logged in it was doing WAXS, but it was not clear how long before Ivan got me to the computer. Vut we can figrue all this out...

Function was running loop of USAXS-SAXS-WAXS. The data where all failed are here: 03_30_Andrew/03_30_Andrew_usaxs/TestOLV_179C__1100PSI_67min_0568.h5 in this USAXS scan can we see loss of beam about half way (170 points or so) through the scan - BUT scan continues. When APS/instrument lost beam staff tried to abort to go inside the hutch and replace the sample (this was in situ experiment). Somehow, after staff ctrl-C and Re.abort() (supposedly done), instrument continued to collect data in USAXS and after this collected also SAXS and WAXS (which both exist). I ctrl-C'd (4x best guess) during the SAXS/WAXS data collections - and system stopped after WAXS. It was bit frustrating, since the RE was counting time and kept printing messages even though I kept hitting ctrl-C. It was not clear if I am actually talking to the command window or what is happening.

jilavsky commented 1 year ago

Exist and restart is the right action here, but some staff is worried about getting back. There is list with procedure, but some staff and all users prefer not to try to restart BS. Too many options of forgetting something. Especially newUser and loading any custom code they may be running.

prjemian commented 1 year ago

Ok, the source of this problem is getting clearer. It's very familiar. We installed some handling to "recover" from exceptions. It's that handling which is failing us here. Given there are just a few weeks in the run before APS-U, let's leave this as a won't fix right now problem.

For the future, we must make the scans much more robust against this situation. Involves try..except..else..finally wrapping of the scans, with signaling about how to decide whether to continue or break out of the scan batch. We've got an issue already and this is a duplicate.

prjemian commented 1 year ago