Addition of analysis of non-FISH signal in regions defined by FISH signal

vreuter commented 5 months ago

Same idea as final step initially developed for DNA DSB project
Keep 1 row per spot, 1 column group per molecule of interest x measurement (e.g., mean RAD51, standard deviation RAD51)
Use, by default, just the central plane +/- 1 $z$-slice

Checklist:

[x] for regional spots
[ ] for locus-specific spots
[ ] different formats of output (#379)
[ ] proper handling of units of measure
[ ] ensure robustness when a ROI is near an image boundary: https://github.com/gerlichlab/gertils/issues/34
[ ] fine drift correction for signal image(s): #343
[ ] support for multiple timepoints in which signal should be analysed
[ ] simplify the collection of statistics which are computer (in particular, choose just 1-2 ways of dealing with the $z$ dimension, given that fundamentally we're defining the ROI as a 2D entity for the purpose of signal analysis)
[ ] compute statistics for aggregates of ROIs in the same trace ID (#380 )

vreuter commented 6 days ago

Supersedes #138

vreuter commented 1 day ago

For the locus-specific spots, consider whether or not we're also re-measuring pixel values in regional spots (particularly if the same ROI diameter is used for regional and for locus-specific spots), as well as the units of measure! In particular, we need to ensure that the ROI size is specified with units, and that we convert as necessary based on which of the fields of the locus spot records are being accessed in order to define the location of each region. We should probably prefer to continue to work in pixels for the specification of the ROI size/diameter, since we'll use the region center and size in order to define a region (in pixels) from which to extract pixel values and over which to compute statistics.

ines-prlesi commented 1 day ago

From a first glance, I think it would be great to have an "average aggregation" of the IF signal, with the modification where we can input the specification of the +/- N step relative to the z center; and not the whole z stack at the given ROI. The idea would be that we then get the mean intensity of the signal over +/- 2-3 slices over the z. However, depending on how the images look like and the step size (@TLSteinacker you must know this better than me) maybe it's not even necessary and then taking the "exact aggregation" would be enough (since this is what I am using for my analyses with the reparafil pipeline and it gives reliable results).

vreuter commented 1 day ago

Thanks @ines-prlesi

Other ideas from discussion:

If a user's specifying a $z$ depth, this could be interpreted either as an absolute value or as a percentage of the total available $z$ depth
If $z$ depth is interpreted absolutely, then care should be taken that units' space is clear/explicit (i.e., image space and units in pixels, or physical space and units as e.g. nanometers)

TLSteinacker commented 1 day ago

@vreuter thanks for initiating this discussion @ines-prlesi what parameters are you currently using? have you tested different ones and how do they change the relative outcome? I agree that the whole z would not be good, and think that either a selection of z slices (~2-3) or the calculation of the sphere might be preferential.

vreuter commented 1 day ago

Thanks @TLSteinacker , what's implemented for the DNA DSB project didn't give the user any flexibility, but rather computes all three of these (central slice, average, and max-projection). The reason I'd like to parameterize this, though, is because there are 5 numbers computed for each method (min, max, median, mean, and standard deviation), so the number of values is already relatively large, and for reasons I think we've discussed together or in pairs, the most used of these methods is the central-plane one.

So the "sphere" is a new idea, but of the previous three I think there's consensus to keep the central plane one but provide flexibility to add a +/- n value, therefore in total having the 2D ROI area x ($2n + 1$) layers of $z$ as the voxel in which to compute the summary statistics.

vreuter commented 1 day ago

@ines-prlesi @TLSteinacker how about having either of the following be valid, then...

Option for sphere, with a single parameter value representing either the radius or diameter a sphere in $z$
Option for rectangular prism, with single value representing either the "height" of the box (number of $z$ units), or the half-width?

?

At least one would be required, and then either we could allow one-and-only-one, or allow both and then use both methods, differentiating column names by, e.g., a __box or __sphere suffix.

Here could be the format of a valid specification...

{
    "shape": shape
    dimension: "n px",
}

where $n$ is a positive integer, $shape \in ("sphere", "box")$, and $dimension \in ("diameter", "sideLength")$

This would of course be a particular choice balancing expressiveness/flexibility, user burden, and clarity.

WDYT? Do you prefer something else to differently balance these tradeoffs?

TLSteinacker commented 1 day ago

Sounds like a good plan to me! Regarding ' with single value representing either the "height" of the box (number of z units), or the half-width?', would there not be one value for the xy size and one for z? Or is xy anyway already specified somewhere else?

vreuter commented 1 day ago

would there not be one value for the xy size and one for z? Or is xy anyway already specified somewhere else?

Indeed, sorry, I was operating with the "diameter (xy) is already set" model in my head since that's currently the case, but you're right @TLSteinacker we'd need separate values for xy.

Another thing just occurred to me...we began by saying +/- a margin from central plane in $z$, and with a diameter in the xy plane. I then started talking about a sphere, but actually we'd have an ellipsoid since no constraint that the $z$ depth match the diameter in $xy$. In fact, I acknowledge it's a bit nonsense to talk this way anyhow since the $xy$ units are pixels in a way that $z$ are not. So how about this...

{
  "x": "<x> px", 
  "y": "<y> px", 
  "z": <z>, 
  "shape": shape
}

where the $x$, $y$, $z$ values are all populated by positive integers, with the "px" added to be clear that it's pixels, not a fixed physical unit. $z$ is left as a scalar to note the difference with the other two. $shape$ must be either "ellipsoid" or "box", and then the values are used accordingly to construct the actual volume of pixels over which summary stats of the pixel values are calculated. This leaves flexibility to define $x$ and $y$ separately in the event (even if rare) that the regions are known/expected to have more rectangular form in $xy$, at the small user burden of adding an extra key-value pair to specify. I'd favor interpreting these (implicitly, without additional specification) as side lengths (not half-widths) in the "box" case, and analogously, axis lengths in the "ellipsoid" case.

WDYT? @ines-prlesi @TLSteinacker

TLSteinacker commented 12 hours ago

Cool, thanks for clarifying and for pointing out the ellipsoid! would it be helpful to specify z as 'slices' to avoid any ambiguity?

I agree that FWHM would be confusing and would prefer side/axis lengths

vreuter commented 7 hours ago

would it be helpful to specify z as 'slices' to avoid any ambiguity?

Yes, i think that makes it clearer, and saves the confusion of a config file reader thinking that the config file author has mistakenly forgotten units of z (and then really badly, erroneously "correcting" it by adding something which would in fact be wrong). So indeed, we can make it so that the $z$ specification must include "slices" for clarity. @ines-prlesi is that OK with you? The "px" suffix is how I've defined pixels, and those parameter values will parse as such.

Increasingly, computations will be done with values already carrying units of measure (and therefore, implicitly, the "dimension" of the quantity e.g. length, area, etc.), just for safety so that things like the error with regional pairwise distances having been pixels rather than nanometers can't happen. I alluded to this in a previous weekly meeting update/spiel, but just reiterating here. we use the config files (or, in the future the actual image files to define what a pixel in physical space terms, and increasingly we'll be more precise about using that throughout the computations and attaching units to other such parameters.

long story short, yes, we'll

TLSteinacker commented 7 hours ago

sounds good @vreuter, thanks!!

gerlichlab / looptrace

Addition of analysis of non-FISH signal in regions defined by FISH signal #337