Proposal: Implement lazy computing arrays for CNMF outputs

kushalkolar commented 1 year ago

Visualization of the reconstructed movie A * C, reconstruted background b * f, and residuals Y - (A * C) - (b * f) are extremely useful for various things like compoment evaluation, quality evaluation, parameter evaluation, etc. These dense arrays are often too large to fit into RAM and we receive questions from users both here and on gitter about this quite often. I propose implementing these using lazy arrays that perform computation on the fly. This is already implemented in mesmerize-core, so I propose just moving that implementation here. The only dependencies are numpy and scipy which caiman already has. We could alternatively implement it using dask, but that would add a large dependency.

pgunn commented 1 year ago

How does this approach compare to fully computing the arrays and having them reside on disk, instead lazy-loading them? Why is computing them lazily preferable?

kushalkolar commented 1 year ago

The full dense array is often hundreds of GB. And it's just not necessary to have the full array, even for things like max projections which can be calculated without the full array because its an outer product.

EricThomson commented 1 year ago

This is typically be used for plotting functions in mesmerize, right? Is that the main use case?

E.g., when plotting a derived quantity (e.g., A*C) you don't want to (and sometimes can't) load into memory, it would be a waste to save those matrix products as memmapped files, but you can calculate the bits you currently are trying to view "on the fly" easily using indexing (even if the individual matrices are memmapped) using lazy computing. Is the main payoff with dask that it gives you blocks?

pgunn commented 1 year ago

How trivial are the computations? I get the storage benefits (and they could be pretty substantial)

With lazy-loading, the data is essentially read-only when done, which is good for data provenance and presumably means all the compute costs are already paid for any subsequent analysis, and it's going to be potentially compatible with other software that might want to load the format.

Maybe none of this is a concern if the lazy computation really is that deterministic and it doesn't get into the main caiman algorithms -particularly anything like setup_cluster()

pgunn commented 1 year ago

I think I misunderstood the proposal on first read; assuming what you meant was exactly the computations above, this seems reasonable, although is there a reason to have the arrays themselves do it rather than have explicit code do the needed computations piecewise?

(if Y, A, C, b, and f are already present in output directories, then maybe there's no need to worry about compatibility with other software that might want to analyse our output because they can do the computation pretty trivially, en toto with memmap or piecewise, themselves?)

kushalkolar commented 1 year ago

is there a reason to have the arrays themselves do it rather than have explicit code do the needed computations piecewise

The reason for having a specific class to do this is because vanilla numpy doesn't have lazy compute operations (even if we used dask we would need to add a few things), and it is not trival to create the RCM (reconstructed movie), RCB (reconstructed background) and residuals with various options such as using specific components, or getting projections. Also, by creating a numpy-like array type, these lazy arrays can be fed into most visualization tools that work with numpy-like arrays. So you can effectively scroll through multi-hundred-GB movies without computing the entire array and storing it on disk. You often do this multiple times for the same movie, and the sparse matrices only take up a few GB at most. And the lazy array implementation I have also gives you projections along the time axis which would take forever to compute with "brute-force".

This is basically what I want to move to caiman from mesmerize-core: https://github.com/nel-lab/mesmerize-core/blob/master/mesmerize_core/arrays/_cnmf.py

(if Y, A, C, b, and f are already present in output directories, then maybe there's no need to worry about compatibility with other software that might want to analyse our output because they can do the computation pretty trivially, en toto with memmap or piecewise, themselves?)

Yes if they really want to (I actually don't know why anyone would want to do this anyways), they can themselves.

pgunn commented 1 year ago

Oh, that "it is not trivial to create" makes me worried.

One thing I think we should aim for is making it easy for exterior code to do analysis of results of a caiman run. Ideally they should be able to scoop up the results of Caiman and do their thing.

Let's think (and chat) about this some more. I'm not sure what the right thing to do is.

kushalkolar commented 1 year ago

What I mean by "not trivial to create", is that you would have to index components manually if you wanted to manually create the RCM using specific components, whereas here you can provide it as arguments to the constructor and then you get an array that behaves numpy-like.

Lazy array:

rcm = LazyRCM(spatial=estimates.A, temporal=estimates.C, components=[3, 1, 4, 5])

# slice arbitrary indices of the rcm
rcm[10]

rcm[10:40]

# this allows it to easily work with visualization tools, as long as enough of the
# numpy array API is implemented, it will work in most tools.

# get the max image, extremely useful to get a full view of good and bad components
# and their structure, for example if you're interested in somatic imaging and used a
# high CNN threshold you will distinctly notice dendrites in the "bad components" and
# soma in the good components

rcm_good = LazyRCM(spatial=estimates.A, temporal=estimates.C, components=estimates.idx_components)
rcm_bad = LazyRCM(spatial=estimates.A, temporal=estimates.C, components=estimates.idx_components_bad)

# projections
rcm_good.max_image
rcm_good.mean_image

# my implementation has projections for min, max, mean and std. Since we're computing 
# an outer product, we can just get these values on a per-component basis and form the
# image. They are very expensive to compute by "brute force".

compared to current caiman, limited functionality:

time_index = 10
rcm_10 = cnmf.A[components].dot(cnmf.C[time_index, components])

# 10 - 40
rcm_10_40 = cnmf.A[components].dot(cnmf.C[10:40, components])

# as you can see, you have to manually do the outer product for each slice, you can't just get
# an array-like object and feed it to a visualization tool.

pgunn commented 1 year ago

Ah, ok. This seems like a fair proposal then (and that's not a lot of code).

I think it'd be good (but probably not all that necessary) to also give the user an option to precompute the whole thing, but so long as we really document how it all works, anyone who really needs this functionality could do it iteratively; it wouldn't be building anything that would lock someone building alternative visualisation methods out of our results.

kushalkolar commented 1 year ago

The implementation in mescore does have an as_numpy() :D https://github.com/nel-lab/mesmerize-core/blob/523b861d600bb84934e0677682b3c789c72c9160/mesmerize_core/arrays/_base.py#L101-L120

I think we can offer to save the full array chunk-wise using hdf5 (or memmap?), so we roll through time and write things frame-by-frame to prevent the RAM issues. Anyways I don't know of a sane use for such a feature, so I'm fine with any option for this. I did put a placeholder method in the mesore implementation for saving to hdf5 so I have thought of this.

I can't think of use cases for serializing the lazy array to disk, because they can easily be constructed using the spatial and temporal components that can already be serialized from the CNMF object. Unless you or @EricThomson really thinks there's a use for this? I see it as an unecessary serialization option to maintain that doesn't have much use as far as I am aware.

Anyways, I can start a PR right away but wanted to check in first.

kushalkolar commented 1 year ago

One thing I am not sure about is how RCB, RCB and residuals look like with 3D CNMF outputs, we'll need it to work with that as well. I can take a look at the 3D demo and figure it out.

pgunn commented 1 year ago

The argument for offering to serialise it to disk is to make it really easy for other software packages, possibly written in other languages and also possibly without some of the more dynamic features of Python, to be able to easily do their own visualisation.

Maybe it's unimportant. I might be worrying about that angle too much.

One thing - I think your current implementation looks fairly purpose-specific (meaning, for RCM). This is one of those cases where if there's an easy way to decompose the purpose from the mechanism, it might be worthwhile. Might I suggest letting the idea stew in your head for the weekend to see if maybe another design comes to mind where it's just a LazyArray, with the operations you actually want to perform are passed in, like a map or lambda or comprehension, rather than having them be baked in together?

If you're busy or don't want to explore the idea, that's fine, but I suspect it may be a fruitful direction of inquiry.

pgunn commented 1 year ago

(it may also be that having these things be too flexible would be incompatible with having them be performant)

kushalkolar commented 1 year ago

see if maybe another design comes to mind where it's just a LazyArray, with the operations you actually want to perform are passed in, like a map or lambda or comprehension, rather than having them be baked in together

Yea we've have been pondering this for a long time now, and this is what dask kinda does if you chunk along C (all the CNMF outputs would have to be dask arrays for this to work). It still wouldn't give you things like max_image which are really useful for CNMF. So even if we used dask we'd still need to add that in.

Basically, dask has some optimizations for lazy linear algebra when it's possible to get slices of results without computing the entire array. We need some of that here, but there are also CNMF-specific things that we can take advantage of. That is why I think a CNMF-specific implementation in caiman would be useful. They are useful for RCM, RCB and residuals as far as I know. @EricThomson any thoughts on if it's useful elsewhere in caiman?

EricThomson commented 1 year ago

Following and I do think this is important: we've discussed this a lot in context of mesmerize. I'm finishing quilt view branch today (thanks for name idea mesmerize) but will come back more thoughtfully later.

Just my first thoughts so I don't forget:

It could be useful with Johannes' new opencv lazy visualization stuff we could throw it in there actually at the end of notebooks when we typically show reconstructed movies etc. Currently that new opencv is useful for stuff on disk but we could bootstrap into later visualizations using this machinery.

I'm wondering if there could be uses beyond visualization: e.g., when it is useful to just extract such quantities outside of a vis context (not that that is needed to justify its utility -- that is reason enough for people with very large movies).

Obviously with fastplotlib it is important, and useful -- there are visualizations I am currently wanting to do that are just not possible using other tools out there, so getting settled on a roadmap for incorporating fastplotlib is rising in importanced -- but this is a discussion point we should enter in #1118

I'd want to go deeper in the weeds with the details (e.g., profiling dot vs @ as we've discussed), and whether we want an abstract factory class as part of special package, or just individual implementations as needed (I think the factory is probably good, just mentioning here as discussion point).

Anyway just my quick thoughts I am finishing up a bunch of things today as storm/power outages in NC this week set me back a lot, but will loop back to it later!

flatironinstitute / CaImAn

Proposal: Implement lazy computing arrays for CNMF outputs #1155