brainglobe / cellfinder

Automated 3D cell detection in very large images
https://brainglobe.info/documentation/cellfinder/index.html
BSD 3-Clause "New" or "Revised" License
178 stars 38 forks source link

For 2D filtering in the detection step, option for a fixed threshold instead of the n_sds_above_mean_thresh #449

Closed xmikezheng20 closed 1 month ago

xmikezheng20 commented 1 month ago

Is your feature request related to a problem? Please describe.

Hi cellfinder developers,

Thank you for developing cellfinder. I have been trying to use it and the brainglobe ecosystem. I really appreciate your well-documented packages!

I'm trying to run cell-detection on some lightsheet data. To understand the meaning of the parameters, I started looking at the source code, and I have a question. I see that the first step in detection is 2D-filtering of each plane using TileProcessor. And I see that the threshold is relative to each plane (mean + n_sds_above_mean_thresh * sd):

thresholded_img = enhance_peaks( plane.copy(), self.clipping_value, gaussian_sigma=laplace_gaussian_sigma, ) avg = np.mean(thresholded_img) sd = np.std(thresholded_img) threshold = avg + self.n_sds_above_mean_thresh * sd plane[thresholded_img > threshold] = self.threshold_value

While using n_sds_above_mean_thresh definitely makes it easier to generalize across samples, doing this calculation for each plane will probably lead to different planes within a sample having slightly different thresholds. I'm not sure if this is a behavior I want all the time.

For example, suppose there's a sample where brain region A has some dim signal spanning lots of planes. In plane i, there is no other signal elsewhere, so the threshold is rather low, and the cells in region A in plane i cross that threshold and get detected. In plane j, there is some bright signal elsewhere, making the threshold high, and the cells in region A in plane j don't pass that threshold. I think ideally whether cells in region A get detected in any plane shouldn't necessarily depend on the signal level in other parts of the plane.

Am I understanding the code correctly? Do you think this is a valid concern? Am I missing something?

Describe the solution you'd like

If my understanding of the code is correct... I think using n_sds_above_mean_thresh per plane as the default makes a lot of sense. But I also think having the option to use a uniform fixed threshold value for all planes of a sample may be a good option if user wants more manual control over the detection step in some particular sample.

Thank you in advance for reading and considering this!

Best, Mike

alessandrofelder commented 1 month ago

Hi Mike,

Thanks for getting in touch and writing such an extensive issue. I think you understand the code correctly - yes.

Do you think this is a valid concern?

I think in the typical use case one doesn't care too much about this, because for the detection should include as many true positives as possible (ideally all of them, at the expense of including false positives!) - the idea is to filter out the false positives during the subsequent classification step. It might be a concern if you want to just do candidate detection without classification and therefore need to find a tight balance between true and false positives.

Is your question more about theoretical understanding, or is this a practical concern for your lightsheet data too?

xmikezheng20 commented 1 month ago

Thank you so much for your reply!

I see. At the moment, I've only imaged a small sample (~1/4 of a mouse brain) to test an antibody and prototype analysis pipeline. The antibody labels neurons in a small set of brain regions. I was trying to understand how the detection parameters affect the detection output by running something like mp_tile_processor.get_tile_mask(my_plane) on individual 2d plane for the 2d filtering step. I found that the 2d filtering on different individual plane gave me somewhat unexpected results - the n_sds_above_mean_thresh that worked super well for one plane didn't work very well for another plane. This was what led to my question...

From what you said and the documentation, I feel that running through the detection + classification pipeline should definitely add to the robustness. I also suspect having full-brain volumes, once I get them, will help too. Also, perhaps it depends on the labeling too - something like cFos will probably have more uniform signal across different planes and therefore less affected by this issue. I was just wondering if it'll help to have the option to optimize each step of the pipeline

alessandrofelder commented 1 month ago

Yea, I think even the subsequent 3d filtering will smooth out the effect you see in 2D a bit.

Cool - let us know how you get on - we're always interested in hearing from users. I'll close this for now - but I am happy to re-open if you or anyone else can make a good practical case for this feature 🙂 thanks for asking the question!