MouseLand / suite2p

cell detection in calcium imaging recordings
http://www.suite2p.org
GNU General Public License v3.0
334 stars 239 forks source link

BUG: Can not handle an h5 file with 5D data structure #1111

Open knowblesse opened 3 months ago

knowblesse commented 3 months ago

Describe the issue:

When suite2p try to open an h5 file with 5D data structure, it can not iterate through frames with this error. TypeError: Indexing arrays must have integer dtypes The cause of the error is the fact that arguments of np.arange receives int / int when hdims != 3.

Reproduce the code example:

# needs an h5 file with 5D array structure.

Error message:

Traceback (most recent call last):
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3550, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-2-5c404d568d0f>", line 1, in <module>
    f[key][irange, ...]
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/dataset.py", line 831, in __getitem__
    selection = sel.select(self.shape, args, dataset=self)
  File "/home/jihoon_jeong/miniforge3/envs/2p_prep/lib/python3.9/site-packages/h5py/_hl/selections.py", line 82, in select
    return selector.make_selection(args)
  File "h5py/_selector.pyx", line 282, in h5py._selector.Selector.make_selection
  File "h5py/_selector.pyx", line 197, in h5py._selector.Selector.apply_args
TypeError: Indexing arrays must have integer dtypes

Version information:

0.14.0

Context for the issue:

No response

knowblesse commented 3 months ago

Double value in np.arange was not the only problem.

Fix 1

In line 63-64, nframes_all which I think a variable representing total number of frames, only consider the first two dimension. This works fine with 4D, but when 5D data is provided, the algorithm must consider number of frames number of planes number of channels

nframes_all = f[key].shape[0] if hdims == 3 else f[key].shape[0] * f[key].shape[1]

Fix 2

I don't know why there is this comment data = (nchan x) (nframes x) nplanes x pixels x pixels

If that comment is right, than...

  1. both nframes x nplanes x pixels x pixels and nchan x nframes x pixels x pixels should work for 4D data,
  2. and 5D data with nchan x nframes x nplanes x pixels x pixels

but the second case of the 4D and the 5D fail because of the line 81 im = f[key][irange, ...]

It would be much safer to check the shape of the data and raise error when the data shape didn't match with ops.npy.

Same thing applies to line 82-83

Fix 3

I don't understand the logic in line 97 i0 = nchannels * ((j) % nplanes)

as ((j) % nplanes) is always j

For the 5D, don't we want something like this?

[fr0 ch0 pl0], [fr0 ch0 pl1], [fr0 ch1 pl0],[fr0 ch1 pl1] [fr1 ch0 pl0], ...

In this case, i0 = j will do the job as it will move one index further for each plane.

Then line 98-99 and line 104-105 should be changed too.

Minor

line 87 if type(im[0, 0, 0]) == np.uint16: would be much direct if im.dtype == np.uint16:


Please correct me if I'm wrong. BTW, it's an awesome package! Thank you for your job!

carsen-stringer commented 3 months ago

Thanks @knowblesse for noticing this and trying to fix this issue. We don't use h5 files ourselves, but I think this deinterleaving of irange is necessary for scanbox converted files? I believe these files had data arrays which were/are 3D/4D which needed to be converted to 5D format to put into our binaries (tiffs have this sort of ordering too from Scanimage). But I honestly don't remember who we added this specific 4D/5D fix for.

If possible, ideally the fix would be backwards compatible, e.g. for Fix 1,

nframes_all = np.prod(np.array(f[key].shape[:-2]))

For Fix 2 -- check if the h5 file has the right dimensions -- if so, skip all of the deinterleaving steps and write straight to binary?

I'm not sure if Fix 3 means there is a bug for these sorts of files already anyway?

knowblesse commented 3 months ago

Hi Dr. Stringer, I'm Ji Hoon, and thank you for your comment! Your feedback is truly an honor! Our lab, led by PI Tristan Geiller at Yale, has recently embarked on building our own data processing pipeline. During this process, we had the pleasure of discovering your remarkable package. After careful consideration, we decided to adopt the H5 format with a 5D array over the TIFF stack option.

I actually fixed the issue, and tested on 3D, 4D (with multiple channels), 4D (with multiple planes), and 5D (both multiple channels and planes). I've submitted a pull request (#1112), and I would greatly appreciate your review.

Thank you!