FXI: triggering Andor detector takes long time

gmysage commented 6 years ago

It looks like triggering the detector takes a large portion of overhead time. Here is what I have tried (exposure time and acquire period are set to 0.05s for all test):

For Andor camera, I can reach 17 Hz if running in continuous mode in CSS.
If I run the scan: RE(count([Andor], 100), it takes 90sec for 100 images, corresponding to scanning rate of 1.2Hz
If I choose to take multiple images (e.g. 10 images) per each triggering, and run RE(count([Andor], 10)), it gives a image shape of (10, 10, 2160, 2560), effectively 100 images using 17 sec, which means the scanning rate is about 5.9 Hz
If I choose to take 100 images per each triggering, and run RE(count([Andor], 1)), it gives a image shape of (1, 100, 2160,2560), and it takes about 10sec, so it is 10 Hz

The problem in taking multiple images per triggering is that, it does not record the motor position (e.g. rotation stage in "fly scan")

Based on the observation, it looks like triggering the detector takes a lot of overhead:

In [9]: RE(count([Andor], 10))
19:19:03.532400 stage             -> Andor           args: (), kwargs: {}
19:19:03.777227 open_run          -> None            args: (), kwargs: {'detectors': ['Andor'], 'num_points': 10, 'num_intervals': 9, 'plan_args': {'detectors': ["Manta(prefix='XF:18IDB-BI{Det:Neo}', name='Andor', read_attrs=['hdf5', 'stats1', 'stats5'], configuration_attrs=['cam'])"], 'num': 10}, 'plan_name': 'count', 'hints': {'dimensions': [(('time',), 'primary')]}}
19:19:03.909235 checkpoint        -> None            args: (), kwargs: {}
19:19:03.909457 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-82cf66'}
19:19:03.911521 wait              -> None            args: (), kwargs: {'group': 'trigger-82cf66'}
19:19:04.554847 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:04.554978 read              -> Andor           args: (), kwargs: {}
19:19:04.580523 save              -> None            args: (), kwargs: {}
19:19:04.589170 checkpoint        -> None            args: (), kwargs: {}
19:19:04.589365 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-d5bb5c'}
19:19:04.591320 wait              -> None            args: (), kwargs: {'group': 'trigger-d5bb5c'}
19:19:05.295046 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:05.295234 read              -> Andor           args: (), kwargs: {}
19:19:05.297107 save              -> None            args: (), kwargs: {}
19:19:05.300038 checkpoint        -> None            args: (), kwargs: {}
19:19:05.300314 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-e7da9b'}
19:19:05.302524 wait              -> None            args: (), kwargs: {'group': 'trigger-e7da9b'}
19:19:05.953721 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:05.953915 read              -> Andor           args: (), kwargs: {}
19:19:05.955782 save              -> None            args: (), kwargs: {}
19:19:05.958449 checkpoint        -> None            args: (), kwargs: {}
19:19:05.958700 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-fac4a8'}
19:19:05.960827 wait              -> None            args: (), kwargs: {'group': 'trigger-fac4a8'}
19:19:06.660897 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:06.661033 read              -> Andor           args: (), kwargs: {}
19:19:06.662482 save              -> None            args: (), kwargs: {}
19:19:06.665135 checkpoint        -> None            args: (), kwargs: {}
19:19:06.665372 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-cd8617'}
19:19:06.667395 wait              -> None            args: (), kwargs: {'group': 'trigger-cd8617'}
19:19:07.323329 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:07.323448 read              -> Andor           args: (), kwargs: {}
19:19:07.324518 save              -> None            args: (), kwargs: {}
19:19:07.326162 checkpoint        -> None            args: (), kwargs: {}
19:19:07.326309 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-59aac4'}
19:19:07.327962 wait              -> None            args: (), kwargs: {'group': 'trigger-59aac4'}
19:19:08.045736 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:08.045932 read              -> Andor           args: (), kwargs: {}
19:19:08.047479 save              -> None            args: (), kwargs: {}
19:19:08.051042 checkpoint        -> None            args: (), kwargs: {}
19:19:08.051365 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-b86e12'}
19:19:08.053808 wait              -> None            args: (), kwargs: {'group': 'trigger-b86e12'}
19:19:08.702118 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:08.702250 read              -> Andor           args: (), kwargs: {}
19:19:08.703764 save              -> None            args: (), kwargs: {}
19:19:08.705595 checkpoint        -> None            args: (), kwargs: {}
19:19:08.705758 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-571046'}
19:19:08.707237 wait              -> None            args: (), kwargs: {'group': 'trigger-571046'}
19:19:09.410331 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:09.410446 read              -> Andor           args: (), kwargs: {}
19:19:09.411741 save              -> None            args: (), kwargs: {}
19:19:09.413733 checkpoint        -> None            args: (), kwargs: {}
19:19:09.413903 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-bfe105'}
19:19:09.415342 wait              -> None            args: (), kwargs: {'group': 'trigger-bfe105'}
19:19:10.071331 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:10.071493 read              -> Andor           args: (), kwargs: {}
19:19:10.072843 save              -> None            args: (), kwargs: {}
19:19:10.075255 checkpoint        -> None            args: (), kwargs: {}
19:19:10.075467 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-b4dd59'}
19:19:10.077603 wait              -> None            args: (), kwargs: {'group': 'trigger-b4dd59'}
19:19:10.790743 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
19:19:10.790852 read              -> Andor           args: (), kwargs: {}
19:19:10.792053 save              -> None            args: (), kwargs: {}
19:19:10.793862 close_run         -> None            args: (), kwargs: {'exit_status': None, 'reason': None}
19:19:10.798074 unstage           -> Andor           args: (), kwargs: {}
Out[9]: ('8e222e68-56b6-4224-b731-3ef9e904eadc',)

mrakitin commented 6 years ago

Was at FXI with @gmysage diagnosing the problem.

mrakitin commented 6 years ago

@danielballan, @tacaswell, any ideas?

tacaswell commented 6 years ago

I am not surprised, by this. When you trigger the camera from bluesky we wait for the underlying area detector IOC to report that the image is acquired and through all of the AD pipeline processing.

What plugins do you have setup on AD? Turning off ones you do not need will probably help a bit. When in 'free run' mode you are probably not saving images and have blocking callbacks turned off. For data collection you are probably saving files and we turn blocking callbacks on (to make sure everything stays synchronized).

You probably want to monitor the motor position and then re-align them in analysis.

To get a sense of what the timing of the underlying system is, set the Andor up as it would be for data collection (just calling andor.stage() is probably enough) and then in one terminal use camonitor to watch the acquire bit and use CSS or caput to trigger an image acquisition cycle.

mrakitin commented 6 years ago

FYI: the definition of the Andor detector is done here.

gmysage commented 6 years ago

I turned off all the plugins except for "Image1, PVA1, PROC1, TRANS1 FileHDF1". When doing the scan as: RE(count([Andor], 1)), it still gives a ~0.5s wait time --> 0.5s/image. See:

RE(count([Andor], 1))
14:27:12.940888 stage             -> Andor           args: (), kwargs: {}
14:27:13.035984 open_run          -> None            args: (), kwargs: {'detectors': ['Andor'], 'num_points': 1, 'num_intervals': 0, 'plan_args': {'detectors': ["Manta(prefix='XF:18IDB-BI{Det:Neo}', name='Andor', read_attrs=['hdf5'], configuration_attrs=['cam'])"], 'num': 1}, 'plan_name': 'count', 'hints': {'dimensions': [(('time',), 'primary')]}}
Transient Scan ID: 312     Time: 2018/02/23 14:27:13
Persistent Unique Scan ID: '3bbeda5c-d694-4435-8683-814b30ac8bb0'
14:27:13.270258 checkpoint        -> None            args: (), kwargs: {}
14:27:13.270577 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-455549'}
14:27:13.273460 wait              -> None            args: (), kwargs: {'group': 'trigger-455549'}
14:27:19.798874 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
14:27:19.798985 read              -> Andor           args: (), kwargs: {}
14:27:19.809784 save              -> None            args: (), kwargs: {}
New stream: 'primary'

As I report previously, If I set to take 100 images per each triggering, and redo the scan: RE(count([Andor], 1)), the total wait time is 5.5sec --> 0.055sec/image:

In [5]: RE(count([Andor], 1))
14:27:12.940888 stage             -> Andor           args: (), kwargs: {}
14:27:13.035984 open_run          -> None            args: (), kwargs: {'detectors': ['Andor'], 'num_points': 1, 'num_intervals': 0, 'plan_args': {'detectors': ["Manta(prefix='XF:18IDB-BI{Det:Neo}', name='Andor', read_attrs=['hdf5'], configuration_attrs=['cam'])"], 'num': 1}, 'plan_name': 'count', 'hints': {'dimensions': [(('time',), 'primary')]}}
Transient Scan ID: 312     Time: 2018/02/23 14:27:13
Persistent Unique Scan ID: '3bbeda5c-d694-4435-8683-814b30ac8bb0'
14:27:13.270258 checkpoint        -> None            args: (), kwargs: {}
14:27:13.270577 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-455549'}
14:27:13.273460 wait              -> None            args: (), kwargs: {'group': 'trigger-455549'}
14:27:19.798874 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
14:27:19.798985 read              -> Andor           args: (), kwargs: {}
14:27:19.809784 save              -> None            args: (), kwargs: {}
New stream: 'primary'

For both cases, images are saved. I think for our flyscan purpose, we want to achieve at least 15 images/sec.

cowanml commented 6 years ago

Is there a zebra/pizza box involved for the position capture/trigger or is this a software level fly scan implemented in ophyd/bluesky?

danielballan commented 6 years ago

@gmysage I strongly suspect that the most costly plugin is also an indispensable one: FileHDF1. It just takes time to write data to disk. To echo @tacaswell, your 17 Hz "continuous mode" measurement is (I assume!) not writing any data, so it's fast.

Thanks for posting those numbers. When I look at them, I am not surprised or concerned. I observe:

Staging takes about 0.01 seconds, but it only has to happen once per run, so it won't affect your rate.
Opening a run takes 0.23 seconds, which seems a bit long, but again it only happens once per run. (If we needed to streamline this, I have ideas, but I don't think it's relevant to the problem at hand.)
The bluesky operations that don't involve waiting for hardware are running extremely fast -- less than 0.001 seconds -- so bluesky's actual overhead is low.
Waiting for the trigger to complete takes a long time because the detector is acquiring and writing data into an HDF5 file.

Your best bet is to take images in large (~100 image) chunks to get that 10 Hz rate you measured. Do you understand @tacaswell's comment, "You probably want to monitor the motor position and then re-align them in analysis"? Do you need some help implementing that?

Let's get that working, to start.

Looking ahead, getting all the way to 15 Hz will take more effort. You'll may have to turn off blocking callbacks, which can result in data loss and must be done very carefully. You may have to use more advanced capture modes on the HDF5 plugin, as CSX does, which requires recent versions of Area Detector itself -- something that Kaz says he is working on.

@cowanml This is a software-level fly scan. Let's keep it simple for now and bring in zebra/pizza box only if we have to.

gmysage commented 6 years ago

@danielballan @tacaswell I may need some help to implement camonitor. I can monitor other motor movement using e.g., camonitor XF:18IDB-OP{SSA:1-Ax:Vgap}Mtr.RBV, but I don't know how to monitor a camera, since I didn't move any motor in the scan `RE(count([Andor], 1)'

I try to disable the HDFplugin. So I comment out all the hdf component for the camera class as:

class Manta(SingleTrigger, AreaDetector):
    image = Cpt(ImagePlugin, 'image1:')
#    stats1 = Cpt(StatsPlugin, 'Stats1:')
#    stats2 = Cpt(StatsPlugin, 'Stats2:')
#    stats3 = Cpt(StatsPlugin, 'Stats3:')
#    stats4 = Cpt(StatsPlugin, 'Stats4:')
#    stats5 = Cpt(StatsPlugin, 'Stats5:')
    trans1 = Cpt(TransformPlugin, 'Trans1:')
#    roi1 = Cpt(ROIPlugin, 'ROI1:')
#    roi2 = Cpt(ROIPlugin, 'ROI2:')
#    roi3 = Cpt(ROIPlugin, 'ROI3:')
#    roi4 = Cpt(ROIPlugin, 'ROI4:')
    proc1 = Cpt(ProcessPlugin, 'Proc1:')

#    hdf5 = Cpt(HDF5PluginWithFileStore,
#              suffix='HDF1:',
#             write_path_template='/NSLS2/xf18id1/DATA/detA1/commissioning',
#               root='/NSLS2/xf18id1/DATA/detA1/',
#               reg=None)  # placeholder to be set on instance as obj.hdf5.reg

    def stop(self):
        self.hdf5.capture.put(0)
        return super().stop()

    def pause(self):
        self.hdf5.capture.put(0)
        return super().pause()

    def resume(self):
        self.hdf5.capture.put(1)
        return super().resume()

    @property
    def hints(self):
        return {'fields': [self.stats1.total.name,
                           self.stats5.total.name]}

Andor = Manta('XF:18IDB-BI{Det:Neo}', name='Andor')
Andor.stage_sigs['cam.image_mode'] = 0

As a test, after the modification, when I type Andor.stage(), hdfplug keep being 'disabled' in CSS.

Then, when I run the scan, it still gives large wait time. And if I try to fetch the image data using: img=np.array(list[db[-1].data('Andor_image')]), it shows no data....

In [1]: RE(count([Andor], 1))
16:15:47.302287 stage             -> Andor           args: (), kwargs: {}
16:15:48.171506 open_run          -> None            args: (), kwargs: {'detectors': ['Andor'], 'num_points': 1, 'num_intervals': 0, 'plan_args': {'detectors': ["Manta(prefix='XF:18IDB-BI{Det:Neo}', name='Andor', read_attrs=['hdf5'], configuration_attrs=['cam'])"], 'num': 1}, 'plan_name': 'count', 'hints': {'dimensions': [(('time',), 'primary')]}}
Transient Scan ID: 315     Time: 2018/02/23 16:15:48
Persistent Unique Scan ID: '6e9f0fc1-5945-42e3-89cd-9be8a2105b39'
16:15:48.350687 checkpoint        -> None            args: (), kwargs: {}
16:15:48.350954 trigger           -> Andor           args: (), kwargs: {'group': 'trigger-a92fe2'}
16:15:48.362670 wait              -> None            args: (), kwargs: {'group': 'trigger-a92fe2'}
16:15:48.879306 create            -> None            args: (), kwargs: {'name': 'primary'}                                                          
16:15:48.879459 read              -> Andor           args: (), kwargs: {}
16:15:49.038213 save              -> None            args: (), kwargs: {}
New stream: 'primary'

gmysage commented 6 years ago

Dear all, do you have more suggestions? Thanks,

gmysage commented 6 years ago

I have done the following to test the time consumed by saving image. The following results suggest that communication to confirm "some thing is successfully done" takes a significant amount of overhead.

I run the following two types of scripts to test the time to save images, and use camonitor to monitor the PV XF:18IDB-BI{Det:Neo}TIFF1:WriteFile:

1st:

cap_img = 'XF:18IDB-BI{Det:Neo}TIFF1:WriteFile'
for i in range(10): 
    my_save_cmd = 'caput ' + cap_img + ' ' + 'Write'
    subprocess.Popen(my_save_cmd, shell=True, stdout=subprocess.PIPE,  stderr=subprocess.STDOUT)
    time.sleep(0.02)

for this script, camonitor gives following results. The time for single step is <0.04s. I have also confirm that I have saved 10 images in my directory. If I reduce the sleep time(e.g. 0.01), I will miss images.

XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.809234 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.838784 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.877956 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.915422 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.957302 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:04.995035 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:05.036236 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:05.078135 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 13:50:05.114900 Done

2nd script:

cap_img = 'XF:18IDB-BI{Det:Neo}TIFF1:WriteFile'
for i in range(10):    
    my_save_cmd = 'caput ' + cap_img + ' ' + 'Write'
    subprocess.Popen(my_save_cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    while True:
        r =subprocess.check_output(['caget', cap_img, '-t']).rstrip()
        r = str(r)[2:-1]
        if r == 'Done':
            break

In this script, I code it to confirm the status of saving image before starting another image saving. The camonitor gives an average time for single step of 0.12s. Following is the output of camonitor:

XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:00.613954 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:17.837061 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:17.958904 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.084958 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.209414 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.332542 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.449850 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.563026 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.681280 Done  
XF:18IDB-BI{Det:Neo}TIFF1:WriteFile 2018-02-26 14:02:18.808993 Done

mrakitin commented 6 years ago

@gmysage, thanks for the test results. Are you available tomorrow after 5 pm? @danielballan and me can come and try to help you at the beamline.

gmysage commented 6 years ago

Thank you. I will meet you tomorrow at 5pm at beamline.

NSLS-II / Bug-Reports

FXI: triggering Andor detector takes long time #188