flatironinstitute / CaImAn

Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization.
https://caiman.readthedocs.io
GNU General Public License v2.0
639 stars 370 forks source link

Matlab support broken due to changes in scipy #1336

Closed pgunn closed 7 months ago

pgunn commented 7 months ago

scipy removed _open_file from scipy.io.matlab.mio, and that entire namespace is going away in SciPy 2.0.0. We were using this to handle reading .mat files (both old-style ones and new-style hdf5 ones)

The most obvious replacement, scipy.io.matlab.loadmat(), throws an exception rather than letting us read versions from attributes, as so:

>>> fh = scipy.io.matlab.loadmat('example_movies/blood_vessel_10Hz.mat')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/home/pgunn/miniconda3/envs/caibuild/lib/python3.11/site-packages/scipy/io/matlab/_mio.py", line 226, in loadmat
    MR, _ = mat_reader_factory(f, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/home/pgunn/miniconda3/envs/caibuild/lib/python3.11/site-packages/scipy/io/matlab/_mio.py", line 80, in mat_reader_factory
    raise NotImplementedError('Please use HDF reader for matlab v7.3 '
NotImplementedError: Please use HDF reader for matlab v7.3 files, e.g. h5py

This probably won't be that hard to handle (and in some ways probably mirrors some of the recent sbx improvements @ethanbb did for us), unless we bump into deeper API changes.

Currently demo_Ring_CNN is broken because of these changes.

pgunn commented 7 months ago

It looks like the current codebase was able to load() a traditional .mat file, but it was not able to get_file_size() on it, meaning it would take unusually fully parameterised cnmfparams for this to have worked (even before the current issue). Instead it usually just acted as a way to quickly push the user off to using hdf5 support if it were in a modern not-really-a-mat format.

There's a working old-style datafile on our clustered filesystem here: /mnt/ceph/data/neuro/VariousLabeling/Jan_ROIS/Jan40_exp2_001_ROIs.mat

We could decide to stop supporting classic mat files (only supporting the new hdf5 style), and simplify the code a lot, or we could rebuild classic mat support using the newer APIs in scipy; whatever we do we need to not use the old scipy APIs as they no longer work.

pgunn commented 7 months ago

That datafile only works so far into the load() chain, because the subset of the matfile is hardcoded as 'data' in our mat loading code and it uses 'M' instead. If we're going to support classic mat files we'd need to possibly do a number of additional changes to the file.

I'm leaning towards removing support for old-style mat files, only supporting the hdf5-based new mat format. Most mat files I can find seem to be new-style, and asking people to convert to new-style should be easy (even in matlab).

ethanbb commented 7 months ago

For what it's worth, I've used pymatreader in the past for reading (not writing) both types of .mat files. Looking at it now, I guess it would require installing from another conda channel.

pgunn commented 7 months ago

If we're to go down this route, this is the exception we can catch if the h5py module tries to open a classic mat file:

>>> fh = h5py.File(fn, 'r')
Traceback (most recent call last):
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
OSError: Unable to synchronously open file (file signature not found)

I think this is probably the right choice.

pgunn commented 7 months ago

@ethanbb I suspect users probably won't complain too much if we switch to only supporting the newer mat format? That's pretty easy.

ethanbb commented 7 months ago

I would hesitate a little because the old-style mat files are still the default (perhaps will forever remain so?); while I usually change that default, it's still not that uncommon to realize that I have an older-format file because I forgot to on some machine. I suppose it's easy to write a script to convert a bunch of files to the new format...though it might take a while to run.

That being said, I don't have the full context around how difficult supporting the older format would be - trying to go through your comments and figure it out. It sounds like getting the file size may be more difficult and does not currently work?

ethanbb commented 7 months ago

Ah, I just realized you're saying that the current codebase has never (?) worked for classic .mat files anyway, despite the fact that the load function doesn't error with them (but isn't robust to different variable names etc.) Also, since they have a limited file size, they don't seem very useful for storing calcium data. Understanding that, I have no issues with what you're suggesting.

pgunn commented 7 months ago

It probably worked more generally at some point in the distant past, just not anytime recently. The load() function worked, the get_file_size() function did not, so before the scipy changes, you'd need to tell caiman in a parameters object what the dimensions are rather than having the code look it up for you by looking at your datafiles.

If anyone shows up with old datafiles and they tell us they don't want to convert for some reason, I can go through the effort of reviving and then finishing the code, but I don't want to add bulk to the codebase for ancient formats unless there's a good case to be made.

pgunn commented 7 months ago

Fixed in dev