flatironinstitute / CaImAn

Computational toolbox for large scale Calcium Imaging Analysis, including movie handling, motion correction, source extraction, spike deconvolution and result visualization.
https://caiman.readthedocs.io
GNU General Public License v2.0
612 stars 362 forks source link

Ideas about storage engine #988

Open kushalkolar opened 2 years ago

kushalkolar commented 2 years ago

Hi Pat, it was great to meet you a few weeks ago at the workshop! I had some ideas about the storage engine, this is just brainstorming. I don't know how this would apply for online (OnACID etc.) since I've never used them.

In regards to downstream compatibility with mesmerize-core, The outputs of a batch item, which is a single run of an algorithm on a single movie, get organized in a single dir. If there are multiple runs on the same input movie, each run has its own output dir.

For mcorr, currently I use the CAIMAN_TEMP env var to store the mcorr memmap outputs. Correlation and PNR images are saved used np.save() to the output dir.

For CNMF, the hdf5 output files are saved to the output dir using cnmf.CNMF.save(), just like how you would use it in a notebook. I've tried to make it as close as possible. The CNMF C-order memmap file is created using caiman.save_memmap() and then after the CNMF run is finished it's just moved to the output dir. Correlation and PNR images are again just saved using np.save()

pgunn commented 2 years ago

Some of this aligns with what I have in mind; it'd be nice to try to be smart about when a directory can be reused and when it can't, where we perhaps hash the parameters towards that end. So if someone starts another run with different params they get a different directory. Hopefully a full design that does a good enough job at everything will crystalise soon.

kushalkolar commented 2 years ago

I like the hashing idea! Maybe implementing __hash__ for CNMFParams is a starting point?