MetaCell / salk-interactive-atlas

1 stars 1 forks source link

Create average population in 3D -updated- #248

Open neuroBazzi opened 1 year ago

neuroBazzi commented 1 year ago

@tarelli @stephenlenzi this is to track our current (lack of) understanding of Martyn's request to average neuronal centroids in 3D and show them as new centroids? Let's start from noting down what is our understanding of his request

stephenlenzi commented 1 year ago

I think perhaps it relates to resident populations as they want to show a representative subset of each genetic marker. I think they want to use the populations e.g V2a_Sp8 as some sort of reference against which to look at their tracing experiments and so maybe what martyn is after is just a good way to randomly sample from each cord for a given marker.

I would advise against what I think was suggested, using a heatmap to generate fake coordinates. I would prefer to randomly sample from real data, from the multiple individual samples and use that. Not sure its what they're after.

neuroBazzi commented 1 year ago

@stephenlenzi @tarelli I had meeting with Martyn and Sofia and I'm reporting here what we put in focus.

The goal of the task is to create representative populations for each of the four cardinal classes. The representative population in every plane will contain the same number of neurons (approximately) of the average number of neurons of that given population taken from their upsampled experiments. For example, if they have fours V1 experiments, V1representative (V1r) will have in every plane the average number of neurons across the experiments. The problem is of course the location of each neuron. What Martyn has in mind is to create a 'density volume' (or probability volume) from which we can extract the most probable location of the neurons, and he would like this location to be unique per population. This means that in any given plane any X, Y coordinate should belong only to one population, to avoid overlap. Neurons will be represented as spheres as we normally do, but their location would represent the most likely set of coordinates based on the experimental data. These requirements are kinda fuzzy and we prolly need to get creative but I think Stephen already landed to a similar interpretation a while ago. Stephen what do you think?

This task is data science and is the backbone of what will be displayed in the resident population task (#246).

stephenlenzi commented 1 year ago

@neuroBazzi How about something like this:

Here is an image with the atlas, overlay of the "real" coordinates, and a heatmap of the probability map that we can generate from those coordinates.

image

Here is a sample drawn independently from the heatmap data for this experiment, showing 100 coordinates:

image

And similarly for 1000 coordinates:

image

We can also play around with the smoothing of the heatmap if necessary.

Regarding the number of points "average number of neurons across experiments".. does this mean the average of all samples of a given kind (V1cord1 V1cord2) taken for each slice? or the average neurons in the entire cord (probably the latter makes most sense in terms of the implementation).

stephenlenzi commented 1 year ago

Using V3 as an example (V3_Cord1, V3_Cord2, V3_Cord3):

Three populations - loaded together, one 3D density map created.

First slice shown for each real sample:

image image image

In this slice, each sample has 47, 23 and 34 cells respectively.

Average number of neurons in the whole 3d volume is then used to determine the number of cell coordinates to be drawn from the density map.

The same slice shown above with new coordinates sampled from the density map:

image

Contains 36 cells.

Is this what we want, more or less?

neuroBazzi commented 1 year ago

hello @stephenlenzi this is cool, I think we're very close.

One thing we might need to align on is over which volume we want to calculate the average of neurons that will end up in every slice. If I understand correctly you want to use the whole cord, but that would give you the same number of neurons in every slice, wouldn't it? If we calculate the average per slide, we will end up with a different number of neurons in every slide, which I think i what we want as it might represent 'more realistically' biological variability. however I am not sure I am understanding correctly what you're saying as in the example slice above, you say that each sample has 47, 23 and 34 cells respectively, then you say that you use 'the Average number of neurons in the whole 3d volume ' which is 36. but 36 is also very close to the average of 47, 23, 34 so i'm not sure I'm following :-)

anyways, I think Martyn would like to use the slice-wise average to fish out neuronal location from the average density maps (so every slice will have a different number of neurons, unless of course we end up with the same average by chance)

stephenlenzi commented 1 year ago

Just to clarify - what I have done here is to take the average number of neurons in a whole cord, which is in the thousands. This is then randomly sampled according to the density in 3d. I've then shown a single slice and the neurons that happen to fall in that z position. The slice I am showing has 36 neurons, when the true average is 34.

Due to this sampling method, a given slice wont exactly be the average number of neurons of that slice, but it will reflect to some degree the slices before and after it (and so can be higher or lower than the true slice average, but will still reflect the appropriate 3d volume average). It's probably "more correct" to do it this way, but it's slightly arbitrary anyway given that we are using the manually/artificially upsampled populations as a starting point for this. My feeling is it is going to come down to client preference in the end.

The alternative implementation would do the same, but would be estimated for each slice independently. Then the number of neurons of each slice would exactly be the average of the raw data samples.

Anyway I have 2d and 3d implementations so we can just use whichever Martyn wants without any significant time cost.

neuroBazzi commented 1 year ago

thanks for the extra details @stephenlenzi now I understand better. my questions is then why you think this implementation is better than estimating the average for every slice independently. my gut feeling is that even if the two solutions are not formally equivalent the result should be very similar, and I can't think of any corner case in which either solution would be better. could you please elaborate?

stephenlenzi commented 1 year ago

I guess its pedantic, but lets say you have some variance in cell count in each slice - drawing from 3d means you're averaging across z to some degree so my gut feeling is that should give slightly more accurate results as you'd be smoothing out the variance to some degree. Ultimately I agree with you though, they should be very similar anyway, and either implementation would be sufficient for what they want.

By better here I just mean closer to some hypothetical true population of neurons but if the aim is to match the 2d slices to the raw data we should go for the 2d implementation.

neuroBazzi commented 1 year ago

lol, I don't think it's pedantic. I was picturing the same scenario but I was interpreting it in the opposite way: the high variance of a given population in a given slice is biologically relevant (but I can't really think of a case). the second point you mentioned, namely the similarity with the 2d slices, is probably a stronger rationale as they seem to get stuck on that.