Closed kushalkolar closed 9 months ago
It will probably take me a while to really think about this properly in the weeds, but big picture thoughts for now (also forgive if this is somewhat terse I'm jet lagged and have to rush to something soon :smile: ):
1) Does this belong in cnmf or utils or visualization? Is this mainly an aid for visualizating? If so, should it go with utils, vis? Speaking generally, isn't this more generally applicable than cnmf, which is a folder fairly explicitly reserved for the cnmf algorithm machinery, while other stuff goes elsewhere. So (again, thinking big picture modular decomposition here), where should this really go? I'm thinking not the cnmf dir as this can also be really useful ultimately for motion correction visualizations and things like that (and future decompositions if we provide them).
2) I don't love the naming: RCM /RCB are names people won't understand by looking at them. We should think of more transparent names. I know 'reconstructed movie' and reconstructed background are probably too long to write out fully, but rcb and rcm are abbreviations, and I prefer to not have full blown abbreviations that people will have to learn.
3) Background will ultimately be general not just CNMF, right ? Currently bg is just CNMF but we'd need CMFE added (i.e., need b0 and W for CNMFE), right? This is probably important to get updated on first pass.
4) I would prefer to use more numpythonic @
for matrix multiplication. Last time I checked there was no difference in terms of profiling, but semantically I think @
is better/more transparent than dot
which will tend to confuse people who will think "Why are they doing a dot product rather than matrix multiplication?" As time goes on this will get more confusing to modern readers. I know caiman has dot
all over the place but that's because it is basically legacy linear algebra code from before numpy changed the notation, and we should just update it everywhere at some point. [Hackathon! :smile: ]
These are just my big picture thoughts for now I will have time to go into the weeds next week.
It will probably take me a while to really think about this properly in the weeds, but big picture thoughts for now (also forgive if this is somewhat terse I'm jet lagged and have to rush to something soon š ):
- Does this belong in cnmf or utils or visualization? Is this mainly an aid for visualizating? If so, should it go with utils, vis? Speaking generally, isn't this more generally applicable than cnmf, which is a folder fairly explicitly reserved for the cnmf algorithm machinery, while other stuff goes elsewhere. So (again, thinking big picture modular decomposition here), where should this really go? I'm thinking not the cnmf dir as this can also be really useful ultimately for motion correction visualizations and things like that (and future decompositions if we provide them).
This is not a visualization aide, it's a back-end data array structure that makes larger-than-RAM array visualization possible. You can use it for calculations beyond just visualization. This is very CNMF specific and not useful and comletely unrelated to motion correction visualization.
- I don't love the naming: RCM /RCB are names people won't understand by looking at them. We should think of more transparent names. I know 'reconstructed movie' and reconstructed background are probably too long to write out fully, but rcb and rcm are abbreviations, and I prefer to not have full blown abbreviations that people will have to learn.
Any ideas?
- Background will ultimately be general not just CNMF, right ? Currently bg is just CNMF but we'd need CMFE added (i.e., need b0 and W for CNMFE), right? This is probably important to get updated on first pass.
Right, I will figure this out.
- I would prefer to use more numpythonic
@
for matrix multiplication. Last time I checked there was no difference in terms of profiling, but semantically I think@
is better/more transparent thandot
which will tend to confuse people who will think "Why are they doing a dot product rather than matrix multiplication?" As time goes on this will get more confusing to modern readers. I know caiman hasdot
all over the place but that's because it is basically legacy linear algebra code from before numpy changed the notation, and we should just update it everywhere at some point. [Hackathon! š ]
I tested this and yes it's the same between dot
and @
. BUT np.outer()
behaves weird for some reason.
So for LazyArrayResiduals
all image projections except std are implemented. Computing std(Y - (A* C) - (b * f))
is non-trivial, this is probably best handled with dask so we can hopefully have this in the future!
std(Y) - std(A * C) - std(b * f)
!= std(Y - (A* C) - (b * f))
because sqrt(x - y) != sqrt(x) - sqrt(y)
@EricThomson we will need to chat sometime about b0
and W
, I'm not familiar with it :D
This shows how useful the residuals projection can be, left is K = 1, right is K = 4, you can see neurons left over in the residuals projection.
I should make the @min
, @max
as methods and it actually should be possible to get min, max across an axis. Implement nanmin(), nanmax() as just aliases to min(), max().
So for
LazyArrayResiduals
all image projections except std are implemented. Computingstd(Y - (A* C) - (b * f))
is non-trivial, this is probably best handled with dask so we can hopefully have this in the future!* Basically standard deviation is not distributive: `std(Y) - std(A * C) - std(b * f)` != `std(Y - (A* C) - (b * f))` because `sqrt(x - y) != sqrt(x) - sqrt(y)` * The rest of the operations are distributive which is why we can exploit these tricks :D
@EricThomson we will need to chat sometime about
b0
andW
, I'm not familiar with it :D
I just found two methods in cnm.estimates
I hadn't seen before: compute_background()
and compute_residuals()
and both handle the ring model and all its funky tributaries nicely.
- I would prefer to use more numpythonic
@
for matrix multiplication. Last time I checked there was no difference in terms of profiling, but semantically I think@
is better/more transparent thandot
which will tend to confuse people who will think "Why are they doing a dot product rather than matrix multiplication?" As time goes on this will get more confusing to modern readers. I know caiman hasdot
all over the place but that's because it is basically legacy linear algebra code from before numpy changed the notation, and we should just update it everywhere at some point. [Hackathon! š ]
If we think it should be done, let's just do it. Converting all instances of .dot() to @ is drudgework; not really suitable for a hackathon. I can probably do it all with regexes without taking much time. Maybe even yapf can do it for us, or some other tool.
- I would prefer to use more numpythonic
@
for matrix multiplication. Last time I checked there was no difference in terms of profiling, but semantically I think@
is better/more transparent thandot
which will tend to confuse people who will think "Why are they doing a dot product rather than matrix multiplication?" As time goes on this will get more confusing to modern readers. I know caiman hasdot
all over the place but that's because it is basically legacy linear algebra code from before numpy changed the notation, and we should just update it everywhere at some point. [Hackathon! š ]If we think it should be done, let's just do it. Converting all instances of .dot() to @ is drudgework; not really suitable for a hackathon. I can probably do it all with regexes without taking much time. Maybe even yapf can do it for us, or some other tool.
Not every place where .dot()
is used is meant to be a matrix multiplication so it would be dangerous to do it like that. And without tests for all the linalg in the codebase I wouldn't do it. There is nothing that we gain from changing every .dot()
to @
where it's possible. I think most people familiar with numpy
will know that .dot()
and @
behave the same when it comes to matrix multiplication.
So I guess we've arrived at ModelMovie
, MovelBackground
, ModelResiduals
?
@EricThomson
So I guess we've arrived at
ModelMovie
,MovelBackground
,ModelResiduals
?@EricThomson
I'm ok with that or Reconstructed
or Estimated
. In my opinion -- you pick, ultimately any of them work :smile:
So I guess we've arrived at
ModelMovie
,MovelBackground
,ModelResiduals
? @EricThomsonI'm ok with that or
Reconstructed
orEstimated
. In my opinion -- you pick, ultimately any of them work š
Model is shorter; that's the right choice unless we can find something similarly short that we like better.
So I guess we've arrived at
ModelMovie
,MovelBackground
,ModelResiduals
?@EricThomson
Thinking about this more: one option of the three parts is NeuralModel
(this is AC
right?), BackgroundModel
(this is going to be bf
or the ring model of the background), and Residual
(this is basically Y-NeuralModel-BackgroundModel
, so is basically the error term, all the stuff the model doesn't capture, so isn't really technically part of the model at all).
Or even "movie" . So something like NeuralMovie
, BackgroundMovie
, and ResidualMovie
(maybe bad, just spitballing again).
So I guess we've arrived at
ModelMovie
,MovelBackground
,ModelResiduals
? @EricThomsonThinking about this more: one option of the three parts is
NeuralModel
(this isAC
right?),BackgroundModel
(this is going to bebf
or the ring model of the background), andResidual
(this is basicallyY-NeuralModel-BackgroundModel
, so is basically the error term, all the stuff the model doesn't capture, so isn't really technically part of the model at all).Or even "movie" . So something like
NeuralMovie
,BackgroundMovie
, andResidualMovie
(maybe bad, just spitballing again).
I'm not a huge fan of Movie, but Caiman already uses the term so maybe we're committed to it. I wonder if an interesting first term might be Focus, e.g. MovieFocus, MovieBackg, MovieResids
So I guess we've arrived at
ModelMovie
,MovelBackground
,ModelResiduals
? @EricThomsonThinking about this more: one option of the three parts is
NeuralModel
(this isAC
right?),BackgroundModel
(this is going to bebf
or the ring model of the background), andResidual
(this is basicallyY-NeuralModel-BackgroundModel
, so is basically the error term, all the stuff the model doesn't capture, so isn't really technically part of the model at all). Or even "movie" . So something likeNeuralMovie
,BackgroundMovie
, andResidualMovie
(maybe bad, just spitballing again).I'm not a huge fan of Movie, but Caiman already uses the term so maybe we're committed to it. I wonder if an interesting first term might be Focus, e.g. MovieFocus, MovieBackg, MovieResids
Actually good point that these could get confused with caiman movie object, and one reason I didn't like it is that it is slipping back to naming metadata (what it is) rather than what type of information it contains. That is, the info the three terms contain is the model's predictions of neural activity, the model's predictions of background activity, and the residual (i.e., the stuff the model leaves out, the error term).
Actually good point that these could get confused with caiman movie object, and one reason I didn't like it is that it is slipping back to naming metadata (what it is) rather than what type of information it contains. That is, the info the three terms contain is the model's predictions of neural activity, the model's predictions of background activity, and the residual (i.e., the stuff the model leaves out, the error term).
On that narrow point, I don't think it's bad (at all) to name classes partly by implementation details (provided the name doesn't get too long). It's super common in programming to do so ; the distinction between an int and a long is entirely implementation details even though both are meant to hold numbers. There are commonly a few ways to represent something anyhow, and they often should have distinct names based on the how.
Actually good point that these could get confused with caiman movie object, and one reason I didn't like it is that it is slipping back to naming metadata (what it is) rather than what type of information it contains. That is, the info the three terms contain is the model's predictions of neural activity, the model's predictions of background activity, and the residual (i.e., the stuff the model leaves out, the error term).
On that narrow point, I don't think it's bad (at all) to name classes partly by implementation details (provided the name doesn't get too long). It's super common in programming to do so ; the distinction between an int and a long is entirely implementation details even though both are meant to hold numbers. There are commonly a few ways to represent something anyhow, and they often should have distinct names based on the how.
Yes, but here we are arguing about space so want to eliminate words, and in Python it is considered a code smell to name variables with type flags e.g., data_float
. But foolish consistency and all that -- it can sometimes be useful.
Implements #1155
Adds:
1.
LazyArray
base class which has relevent parts of the numpy ndarray structure to behave like a numpy array.LazyArrayRCM
, for the reconstructed movieA * C
.LazyArrayRCB
, for the reconstructed backgroundb * f
. This is actually just a sublcass ofLazyArrayRCM
to separate them. I am not sure if it's really necessary, we could have one classLazyDenseMovie
or something which works for both.LazyArrayResiduals
These are very useful because they also compute projection images without computing the full dense array.
These should work for visuaslization in most tools that take numpy-like array. It should work in Johannes' ipycanvas based player, and it's already known to work in
fastplotlib
with random access slicing.Thoughts:
Do we want to support saving the dense array to disk, such as in hdf5 which can be lazily accessed? I'm not sure how useful this could be because the arrays can be huge in the hundreds of gigabytes range
I have never worked with 3D data, but this shouldn't be difficult to expand to 3D. Johannes might be the right person to ask?
main todo:
Thoughts and a big-picture review would be useful before a more detailed review.