Open jilavsky opened 1 year ago
I'm suspicious of jobs running in background threads. They may not get cancelled under those conditions. Too much to suggest a workaround of, exit and restart?
On Thu, Mar 30, 2023, 6:32 PM Jan Ilavsky @.***> wrote:
Assigned #590 https://github.com/APS-USAXS/usaxs-bluesky/issues/590 to @prjemian https://github.com/prjemian.
— Reply to this email directly, view it on GitHub https://github.com/APS-USAXS/usaxs-bluesky/issues/590#event-8892716341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMHRBSXDVAO725Q27WDW6YJXNANCNFSM6AAAAAAWN6C57U . You are receiving this because you were assigned.Message ID: @.***>
Which specific data collection continued?
On Thu, Mar 30, 2023, 8:46 PM Pete Jemian @.***> wrote:
I'm suspicious of jobs running in background threads. They may not get cancelled under those conditions. Too much to suggest a workaround of, exit and restart?
On Thu, Mar 30, 2023, 6:32 PM Jan Ilavsky @.***> wrote:
Assigned #590 https://github.com/APS-USAXS/usaxs-bluesky/issues/590 to @prjemian https://github.com/prjemian.
— Reply to this email directly, view it on GitHub https://github.com/APS-USAXS/usaxs-bluesky/issues/590#event-8892716341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARMUMHRBSXDVAO725Q27WDW6YJXNANCNFSM6AAAAAAWN6C57U . You are receiving this because you were assigned.Message ID: @.***>
When I logged in it was doing WAXS, but it was not clear how long before Ivan got me to the computer. Vut we can figrue all this out...
Function was running loop of USAXS-SAXS-WAXS. The data where all failed are here: 03_30_Andrew/03_30_Andrew_usaxs/TestOLV_179C__1100PSI_67min_0568.h5 in this USAXS scan can we see loss of beam about half way (170 points or so) through the scan - BUT scan continues. When APS/instrument lost beam staff tried to abort to go inside the hutch and replace the sample (this was in situ experiment). Somehow, after staff ctrl-C and Re.abort() (supposedly done), instrument continued to collect data in USAXS and after this collected also SAXS and WAXS (which both exist). I ctrl-C'd (4x best guess) during the SAXS/WAXS data collections - and system stopped after WAXS. It was bit frustrating, since the RE was counting time and kept printing messages even though I kept hitting ctrl-C. It was not clear if I am actually talking to the command window or what is happening.
Exist and restart is the right action here, but some staff is worried about getting back. There is list with procedure, but some staff and all users prefer not to try to restart BS. Too many options of forgetting something. Especially newUser and loading any custom code they may be running.
Ok, the source of this problem is getting clearer. It's very familiar. We installed some handling to "recover" from exceptions. It's that handling which is failing us here. Given there are just a few weeks in the run before APS-U, let's leave this as a won't fix right now problem.
For the future, we must make the scans much more robust against this situation. Involves try..except..else..finally
wrapping of the scans, with signaling about how to decide whether to continue or break out of the scan batch. We've got an issue already and this is a duplicate.
User confused, asking for help. Running one of data collection functions in run engine and when APS dumps the ring, user does ctrl-C. Then RE.abort() and data collection resumes - even without beam, which should not be possible. Calls staff, staff logs in remotely, does multiple ctrl-C to abort. After finishing WAXS data collection, eventually (4x ctrl-C?) run engine ends with coming back to prompt without need for RE.abort(). This is not reasonable, users need way to abort reliably and cleanly - and easily. I suspect the issue here was the suspender was aborted by first ctrl-C and killed by RE.abort(), but then for some reason system returned to data collection loop and kept going without suspending due to beam dump and ignoring user cries for help. Can we assign different ctrl-something to abort everything in run engine? Something obvious.