deisseroth-lab / two-photon

Common scripts, libraries, and utilities for 2p experiments
5 stars 6 forks source link

indexing error on artifact removal #10

Open tbenst opened 3 years ago

tbenst commented 3 years ago

I'm not positive, but I think this error may relate to "Handle dual-channel where one channel is functional and the other is still measured." In this recording, located at /oak/stanford/groups/deissero/users/tyler/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044 (error trace below using my local path-- can replace /data/dlab with /oak/stanford/groups/deissero/users/tyler for sherlock), I record a structural channel in red (Ch2) and a functional channel in green (Ch3). I primarily care about removing stim artefact from Ch3, although don't mind removing from both!

I get the following error on preprocessing:

> python ~/code/two-photon/two-photon/process.py --input_dir /data/dlab/b115/ --output_dir /data/dlab/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2:TSeries_lrhab_raphe_40trial-044 --preprocess
2021-03-31 18:55:50.618 metadata:22 INFO Extracting metadata from xml files:
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044.xml
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.xml
2021-03-31 18:55:53.797 metadata:102 INFO The following metadata is written to: /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/metadata.json
{'channels': {0: {'enabled': True, 'name': 'frame starts'},
              1: {'enabled': True, 'name': 'secondary'},
              2: {'enabled': True, 'name': 'winfluo'},
              3: {'enabled': True, 'name': 'Blue'},
              4: {'enabled': True, 'name': 'VR timestamps'},
              5: {'enabled': True, 'name': 'green'},
              6: {'enabled': True, 'name': 'LED'},
              7: {'enabled': True, 'name': 'respir'}},
 'laser': {'power': None, 'wavelength': None},
 'layout': {'frames_per_sequence': 5, 'sequences': 7272},
 'optical_zoom': 1.0,
 'period': 0.0655215,
 'size': {'channels': 2,
          'frames': 7272,
          'x_px': 1024,
          'y_px': 1024,
          'z_planes': 5}}
2021-03-31 18:55:53.960 process:92 INFO Found stim channel "respir", enabled=True
TYLER DEBUG: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.csv
2021-03-31 18:56:07.721 process:161 INFO Read voltage recordings from: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/TSeries_lrhab_raphe_40trial-044_Cycle00001_VoltageRecording_001.csv, preview:
   Time(ms)  frame starts  secondary  ...     green       LED    respir
0       0.0      0.086975   0.013428  ...  0.020142  0.000916 -0.028687
1       0.2      4.942322   0.013733  ...  0.020447  0.001526 -0.030518
2       0.4      4.935303   0.014038  ...  0.020752  0.000916 -0.028381
3       0.6      4.938049   0.013428  ...  0.020752  0.000916 -0.028992
4       0.8      4.939880   0.014343  ...  0.020447  0.001221 -0.027771

[5 rows x 9 columns]
2021-03-31 18:56:11.504 utils:129 INFO Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-03-31 18:56:11.504 utils:141 INFO NumExpr defaulting to 8 threads.
2021-03-31 18:56:11.764 artefacts:15 INFO Stored calculated frame starts in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/frame_start.h5, preview:
Int64Index([1, 334, 667, 1000, 1333], dtype='int64')
2021-03-31 18:56:11.769 artefacts:21 INFO Calculating artefact regions
2021-03-31 18:56:15.409 artefacts:41 INFO Stored calculated artefact positions in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/output/artefact.h5, preview:
       z_plane  y_min  y_max
frame                       
60           3    479    512
61           1    455    490
61           4    433    469
62           2    414    449
63           0    390    426
2021-03-31 18:56:20.405 tiffdata:43 INFO Found data with shape(frames, z_planes, y_pixels, x_pixels): (7272, 5, 1024, 1024)
2021-03-31 18:56:20.722 transform:41 INFO Writing uncorrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/hdf5/uncorrected/uncorrected.h5
[########################################] | 100% Completed |  6min 45.8s
2021-03-31 19:03:10.541 transform:46 INFO Writing corrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish2/TSeries_lrhab_raphe_40trial-044/hdf5/data/data.h5
[                                        ] | 1% Completed | 53.8s
Traceback (most recent call last):
  File "/home/tyler/code/two-photon/two-photon/process.py", line 354, in <module>
    main()
  File "/home/tyler/code/two-photon/two-photon/process.py", line 114, in main
    preprocess(basename_input, dirname_output, fname_csv, fname_uncorrected_hdf5, fname_hdf5, mdata,
  File "/home/tyler/code/two-photon/two-photon/process.py", line 173, in preprocess
    transform.convert(data, fname_data, df_artefacts, fname_uncorrected)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 59, in convert
    data_corrected.to_hdf5(fname_data, HDF5_KEY)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 1475, in to_hdf5
    return to_hdf5(filename, datapath, self, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 4577, in to_hdf5
    store(list(data.values()), dsets)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/optimization.py", line 961, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/utils.py", line 29, in apply
    return func(*args, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 436, in _pass_extra_kwargs
    return func(*args[len(keys) :], **kwargs)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 87, in remove_artefacts
    before = chunk[index - 1, row.z_plane, y_slice]
IndexError: index 2 is out of bounds for axis 1 with size 2
tbenst commented 3 years ago

This issue doesn't appear to be related to multiple channels, as I replicated the error on another file that only recorded from one channel. On Oak, the data is at /oak/stanford/groups/deissero/users/tyler/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038.

Sherlock command: python process.py --input_dir /oak/stanford/groups/deissero/users/tyler/b115/ --output_dir /oak/stanford/groups/deissero/users/tyler/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1:TSeries-lrhab_raphe_stim-40trial-038 --preprocess

Output (run locally):

❯ python ~/code/two-photon/two-photon/process.py --input_dir /data/dlab/b115/ --output_dir /data/dlab/b115/process-output/ --recording 2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1:TSeries-lrhab_raphe_stim-40trial-038 --preprocess
2021-03-31 23:43:31.162 metadata:22 INFO Extracting metadata from xml files:
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038.xml
/data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.xml
2021-03-31 23:43:34.014 metadata:102 INFO The following metadata is written to: /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/metadata.json
{'channels': {0: {'enabled': True, 'name': 'frame starts'},
              1: {'enabled': True, 'name': 'secondary'},
              2: {'enabled': True, 'name': 'winfluo'},
              3: {'enabled': True, 'name': 'Blue'},
              4: {'enabled': True, 'name': 'VR timestamps'},
              5: {'enabled': True, 'name': 'green'},
              6: {'enabled': True, 'name': 'LED'},
              7: {'enabled': True, 'name': 'respir'}},
 'laser': {'power': None, 'wavelength': None},
 'layout': {'frames_per_sequence': 5, 'sequences': 7272},
 'optical_zoom': 1.0,
 'period': 0.065518959,
 'size': {'channels': 1,
          'frames': 7272,
          'x_px': 1024,
          'y_px': 1024,
          'z_planes': 5}}
2021-03-31 23:43:34.135 process:92 INFO Found stim channel "respir", enabled=True
TYLER DEBUG: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.csv
2021-03-31 23:43:49.189 process:161 INFO Read voltage recordings from: /data/dlab/b115/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/TSeries-lrhab_raphe_stim-40trial-038_Cycle00001_VoltageRecording_001.csv, preview:
   Time(ms)  frame starts  secondary  ...     green       LED    respir
0       0.0      0.085144   0.013733  ...  0.019836  0.000916 -0.030212
1       0.2      4.934998   0.013733  ...  0.021057  0.001221 -0.029907
2       0.4      4.934387   0.013123  ...  0.020752  0.001221 -0.029602
3       0.6      4.935303   0.013123  ...  0.020752  0.001221 -0.029297
4       0.8      4.936218   0.013428  ...  0.020752  0.000610 -0.030823

[5 rows x 9 columns]
2021-03-31 23:43:52.034 utils:129 INFO Note: NumExpr detected 36 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
2021-03-31 23:43:52.034 utils:141 INFO NumExpr defaulting to 8 threads.
2021-03-31 23:43:52.239 artefacts:15 INFO Stored calculated frame starts in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/frame_start.h5, preview:
Int64Index([1, 334, 668, 1000, 1333], dtype='int64')
2021-03-31 23:43:52.240 artefacts:21 INFO Calculating artefact regions
2021-03-31 23:43:55.193 artefacts:41 INFO Stored calculated artefact positions in /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/output/artefact.h5, preview:
       z_plane  y_min  y_max
frame                       
60           3    479    512
61           1    455    490
61           4    432    467
62           2    412    447
63           0    389    425
2021-03-31 23:46:51.074 tiffdata:43 INFO Found data with shape(frames, z_planes, y_pixels, x_pixels): (7272, 5, 1024, 1024)
2021-03-31 23:46:51.374 transform:41 INFO Writing uncorrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/hdf5/uncorrected/uncorrected.h5
[                                        ] | 0% Completed |  2.1s/home/tyler/opt/anaconda3/lib/python3.8/site-packages/tifffile/tifffile.py:12019: RuntimeWarning: invalid value encountered in true_divide
  'ZDistance': values[:, 0] / values[:, 1],
[########################################] | 100% Completed |  4min 10.3s
2021-03-31 23:51:05.250 transform:46 INFO Writing corrected data to /data/dlab/b115/process-output/2020-10-28_elavl3-chrmine-Kv2.1_h2b6s_8dpf/fish1/TSeries-lrhab_raphe_stim-40trial-038/hdf5/data/data.h5
[                                        ] | 1% Completed |  9.9s
Traceback (most recent call last):
  File "/home/tyler/code/two-photon/two-photon/process.py", line 354, in <module>
    main()
  File "/home/tyler/code/two-photon/two-photon/process.py", line 114, in main
    preprocess(basename_input, dirname_output, fname_csv, fname_uncorrected_hdf5, fname_hdf5, mdata,
  File "/home/tyler/code/two-photon/two-photon/process.py", line 173, in preprocess
    transform.convert(data, fname_data, df_artefacts, fname_uncorrected)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 59, in convert
    data_corrected.to_hdf5(fname_data, HDF5_KEY)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 1475, in to_hdf5
    return to_hdf5(filename, datapath, self, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 4577, in to_hdf5
    store(list(data.values()), dsets)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 981, in store
    result.compute(**kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 167, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/base.py", line 452, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/threaded.py", line 76, in get
    results = get_async(
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/optimization.py", line 961, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 151, in get
    result = _execute_task(task, cache)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/utils.py", line 29, in apply
    return func(*args, **kwargs)
  File "/home/tyler/opt/anaconda3/lib/python3.8/site-packages/dask/array/core.py", line 436, in _pass_extra_kwargs
    return func(*args[len(keys) :], **kwargs)
  File "/home/tyler/code/two-photon/two-photon/transform.py", line 87, in remove_artefacts
    before = chunk[index - 1, row.z_plane, y_slice]
IndexError: index 4 is out of bounds for axis 1 with size 3
tbenst commented 3 years ago

@chrisroat I think the issue is the use of row.z_plane during a call to map_overlap.

In this case, the uncorrected hdf5 file has a dataset with shape (7272, 5, 1024, 1024), and dask chooses a chunk size of (34, 3, 512, 512). z_plane can't be used to index a chunk

chrisroat commented 3 years ago

Doe the chunks parameter in da.from_array solve this issue?

I've learned a bit more about ome.tiff and how to handle z_planes and timepoints. I am likely going to revisit a lot of logic here that kinda of got piled as I learned on-the-job.

One thing I've considered is to process each z-plane independently. I think it would be more efficient (data could be stored in z/time/y/x ordering), but may not be worth it as it could be surprising/confusing if all the rest of the code is time/z/y/x. I value the "principle of least surprise".

tbenst commented 3 years ago

Doe the chunks parameter in da.from_array solve this issue?

yes, chunks=('auto', -1, -1, -1) fixed this by ensuring z indexing is valid.

One thing I've considered is to process each z-plane independently. I think it would be more efficient (data could be stored in z/time/y/x ordering), but may not be worth it as it could be surprising/confusing if all the rest of the code is time/z/y/x. I value the "principle of least surprise".

I prefer TZYX since my analysis usually runs on all z-planes concurrently, but others like NWB store by plane. Does TZYX preclude / add difficulty parallelization for some reason?

chrisroat commented 3 years ago

Not really. Operating on t-slices is perfectly fine. I'd actually prefer to keep TZYX, as it's more typical and natural (and less surprising, I think).