APS-USAXS / usaxs-bluesky-ended-2023

Bluesky instrument for USAXS
0 stars 0 forks source link

Yet another timeout #554

Closed jilavsky closed 2 years ago

jilavsky commented 2 years ago

Another timeout issue, same PV on different ioc (this is SAXS, before was WAXS). Will restart BS again.

Instrument will be available later today for testing, if needed. I am trying to collect some test data to prove something, but then it will be available, if needed.

date: 2022-02-20 15:43:20.262872
command file: /share1/USAXS_data/2022-02/usaxs.mac
line number: 23
command: ('saxsExp', ['20', '40.7', '1', 'AgBehenateLaB6'], 23, '      saxsExp        20      40.7      1      "AgBehenateLaB6"')
raw command:       saxsExp        20      40.7      1      "AgBehenateLaB6"
attempt: 1 of 1
exception: Attempted to set EpicsPathSignal(read_pv='usaxs_pilatus1:HDF1:FilePath_RBV', name='saxs_det_hdf1_file_path', parent='saxs_det_hdf1', value='/mnt/usaxscontrol/USAXS_data/2022-02/02_20_JanTest/02_20_JanTest_saxs/', timestamp=1645393390.138545, auto_monitor=True, string=True, write_pv='usaxs_pilatus1:HDF1:FilePath', limits=False, put_complete=False, path_semantics='posix') to value '/mnt/usaxscontrol/USAXS_data/2022-02/02_20_JanTest/02_20_JanTest_saxs/' and timed out after 10 seconds. Current value is '/mnt/usaxscontrol/USAXS_data/2022-02/02_20_JanTest/02_20_JanTest_saxs'.
Stopping further processing of this command list.
jilavsky commented 2 years ago

Restarted BS, now got similar issue with BlackFly camera. Code continued, but hanged for long time - timeout?

D Sun-15:50:13 - gain=10000000.0  rate: 82360.0  max: 950000  converged=[True, True]

W Sun-15:51:13 - Could not take sample image:path=/share1/USAXS_data/2022-02/02_20_JanTest/02_20_JanTest_saxs, file=AgBehenateLaB6, order#=1, exc=Status(obj=EpicsSignalWithRBV(read_pv='9idFLY2:JPEG1:FileName_RBV', name='blackfly_optical_jpeg1_file_name', parent='blackfly_optical_jpeg1', value='A', timestamp=1645393813.9335833, auto_monitor=True, string=True, write_pv='9idFLY2:JPEG1:FileName', limits=False, put_complete=False), done=True, success=False)
D Sun-15:51:13 - areaDetectorAcquire(): OrderedDict([('enable', 1), ('auto_increment', 'Yes'), ('array_counter', 0), ('auto_save', 'Yes'), ('num_capture', 0), ('file_template', '%s%s_%4.4d.hdf'), ('file_write_mode', 'Single'), ('blocking_callbacks', 'No'), ('parent.cam.array_callbacks', 1)])

Transient Scan ID: 13     Time: 2022-02-20 15:51:14
Persistent Unique Scan ID: '647b993e-bf1b-4324-9bce-6954271be425'

What is it with the AD iocs and BS today?

prjemian commented 2 years ago

First item happens during detector staging. I'm starting to form a stronger opinion about staging, that perhaps we use it too much for the way it is implemented. Thinking we could refactor many items out of stage_sigs and into yield from bps.mv() so we control exactly when and how often a PV is written. When the RE stages an ophyd object, it calls the objects ophyd.stage() method directly, without concern this might take some time to complete. Same for .unstage(). The code is not obvious about how it handles timeouts, either.

Too tricky by half and we keep getting skewered. Let's address the detector staging issues first since they are most prominent.

Second item (with the blackfly camera) is not obvious what happened. Could not take sample image lacks some detail. Looking, next, at the log file for that time.

prjemian commented 2 years ago

Log file entries for that second:

|2022-02-20 15:50:13.554|DEBUG|43187|bluesky-session|amplifiers|539|bluesky-run-engine| - gain=10000000.0  rate: 82360.0  max: 950000  converged=[True, True]
|2022-02-20 15:50:13.567|DEBUG|43187|bluesky-session|amplifiers|545|bluesky-run-engine| - converged: [True, True]
|2022-02-20 15:51:13.967|WARNING|43187|bluesky-session|sample_imaging|62|bluesky-run-engine| - Could not take sample image:path=/share1/USAXS_data/2022-02/02_20_JanTest/02_20_JanTest_saxs, file=AgBehenateLaB6, order#=1, exc=Status(obj=EpicsSignalWithRBV(read_pv='9idFLY2:JPEG1:FileName_RBV', name='blackfly_optical_jpeg1_file_name', parent='blackfly_optical_jpeg1', value='A', timestamp=1645393813.9335833, auto_monitor=True, string=True, write_pv='9idFLY2:JPEG1:FileName', limits=False, put_complete=False), done=True, success=False)
|2022-02-20 15:51:13.980|DEBUG|43187|bluesky-session|area_detector|34|bluesky-run-engine| - areaDetectorAcquire(): OrderedDict([('enable', 1), ('auto_increment', 'Yes'), ('array_counter', 0), ('auto_save', 'Yes'), ('num_capture', 0), ('file_template', '%s%s_%4.4d.hdf'), ('file_write_mode', 'Single'), ('blocking_callbacks', 'No'), ('parent.cam.array_callbacks', 1)])
prjemian commented 2 years ago

This block of code: https://github.com/APS-USAXS/ipython-usaxs/blob/887377a7eadb0d45875dcd1a55a9a57b6aa3cf5b/profile_bluesky/startup/instrument/plans/sample_imaging.py#L50-L52 and this particular step in the .image_prep() method: https://github.com/APS-USAXS/ipython-usaxs/blob/887377a7eadb0d45875dcd1a55a9a57b6aa3cf5b/profile_bluesky/startup/instrument/devices/blackfly_module.py#L94

You say the code hung for some time, looks like 60 seconds which is the default write timeout that we set: https://github.com/APS-USAXS/ipython-usaxs/blob/887377a7eadb0d45875dcd1a55a9a57b6aa3cf5b/profile_bluesky/startup/instrument/framework/initialize.py#L124-L131

Since this is the sample imaging detector, we have already agreed that this is a value-added but optional item.

And Yes, it is a bad set of problems using BS.

prjemian commented 2 years ago

With the decision to refactor from detector.stage_sigs to explicit calls to bps.mv(), I'll close this BUG report issue.