MouseLand / suite2p

cell detection in calcium imaging recordings
http://www.suite2p.org
GNU General Public License v3.0
334 stars 239 forks source link

BUG: sbx_to_binary computes nframes incorrectly with multiple channels if longer than batch_size frames #1109

Open ethanbb opened 3 months ago

ethanbb commented 3 months ago

Describe the issue:

There seems to be a bug in the way frames are counted when loading one or more Scanbox files. From sbx.py:

for ichunk, onset in enumerate(iblocks[:-1]):
    offset = iblocks[ichunk + 1]
    im = np.array(f[onset:offset, :, :, ndeadrows:, ndeadcols:]) // 2
    im = im.astype(np.int16)
    im2mean = im.mean(axis=0).astype(np.float32) / len(iblocks)
    for ichan in range(nchannels):
        nframes = im.shape[0]
        im2write = im[:, :, ichan, :, :]
        for j in range(0, nplanes):
            if iall == 0:
                ops1[j]["meanImg"] = np.zeros((im.shape[3], im.shape[4]),
                                              np.float32)
                if nchannels > 1:
                    ops1[j]["meanImg_chan2"] = np.zeros(
                        (im.shape[3], im.shape[4]), np.float32)
                ops1[j]["nframes"] = 0
            if ichan == nfunc:
                ops1[j]["meanImg"] += np.squeeze(im2mean[j, ichan, :, :])
                reg_file[j].write(
                    bytearray(im2write[:, j, :, :].astype("int16")))
            else:
                ops1[j]["meanImg_chan2"] += np.squeeze(im2mean[j, ichan, :, :])
                reg_file_chan2[j].write(
                    bytearray(im2write[:, j, :, :].astype("int16")))

            ops1[j]["nframes"] += im2write.shape[0]
            ops1[j]["nframes_per_folder"][ifile] += im2write.shape[0]
    ik += nframes
    iall += nframes

Note that ops1[j]["nframes"] (which becomes ops['nframes']) is updated with the number of frames in the current chunk once per channel, whereas it should be independent of the number of channels. The exception is for the first chunk, where iall is still 0 for each iteration of the ichan loop, so ops1[j]["nframes"] gets reset to 0 for each iteration of this inner loop. Thus, there is only a bug when the total number of frames is greater than the chunk size, which defaults to 500.

Please use the files here to replicate with the code example: https://drive.google.com/drive/folders/1xJ6zG52rMNEKHAN5856KW8PQeeWjxFAB?usp=drive_link

Reproduce the code example:

# pwd should be a writable folder that contains 508_000_000.mat and 508_000_000.sbx
# this is a 1-plane, 2-channel, 1000-frame, 512 x 796 pixel dataset.
import suite2p
import os

ops = suite2p.default_ops()
try:
  os.mkdir('suite2p')
except FileExistsError:
  pass

ops['data_path'] = ['.']
ops['save_path0'] = '.'
ops['save_folder'] = 'suite2p'
ops['fast_disk'] = 'suite2p'
ops['nplanes'] = 1
ops['nchannels'] = 2
ops['input_format'] = 'sbx'
ops['do_registration'] = False
ops['roidetect'] = False
ops['neuropil_extract'] = False
ops['spikedetect'] = False

ops_out = suite2p.run_s2p(ops=ops, db={})

real_frames = 1000
assert ops_out['nframes'] == real_frames, f'Expected {real_frames} frames, got {ops_out["nframes"]}'

Error message:

Traceback (most recent call last):
  File "/synology/eblackwood/VR_learning/2p_testing/508/test_folder/test_bug.py", line 27, in <module>
    assert ops_out['nframes'] == real_frames, f'Expected {real_frames} frames, got {ops_out["nframes"]}'
AssertionError: Expected 1000 frames, got 1500

Version information:

suite2p v0.14.4.dev32+gf500c1a.d20240311

Context for the issue:

This doesn't cause an error message when running the pipeline, but it may cause some of the processing steps to give incorrect results (haven't looked into it). It caused an error in a script I wrote for manual cropping when I tried to use nframes from the returned ops to determine how large of a BinaryFile I needed.

ethanbb commented 3 months ago

Looking a little closer, I think there is at least one other issue with this code block. Initializing "meanImg" to 0 in the if iall == 0 block happens for the 2nd channel as well, meaning that the first 500 frames of the first channel get lost.

I think something like the following would fix it, happy to make a pull request if it makes sense:

for ichunk, onset in enumerate(iblocks[:-1]):
    offset = iblocks[ichunk + 1]
    im = np.array(f[onset:offset, :, :, ndeadrows:, ndeadcols:]) // 2
    im = im.astype(np.int16)
    im2mean = im.mean(axis=0).astype(np.float32) / len(iblocks)
    for ichan in range(nchannels):
        nframes = im.shape[0]
        im2write = im[:, :, ichan, :, :]
        for j in range(0, nplanes):
            if ichan == nfunc:
                if iall == 0:
                  ops1[j]["meanImg"] = np.zeros((im.shape[3], im.shape[4]), np.float32)
                  ops1[j]["nframes"] = 0

                 ops1[j]["meanImg"] += np.squeeze(im2mean[j, ichan, :,
                 ops1[j]["nframes"] += im2write.shape[0]
                 ops1[j]["nframes_per_folder"][ifile] += im2write.shape[0]
                 reg_file[j].write(bytearray(im2write[:, j, :, :].astype("int16")))
              else:
                  if iall == 0:
                    ops1[j]["meanImg_chan2"] = np.zeros(
                          (im.shape[3], im.shape[4]), np.float32)

                  ops1[j]["meanImg_chan2"] += np.squeeze(im2mean[j, ichan, :, :])
                  reg_file_chan2[j].write(
                      bytearray(im2write[:, j, :, :].astype("int16")))
      iall += nframes