Closed aamster closed 2 years ago
I revisited the centroid calculation to try to improve centering of soma specifically for large soma-dendrite ROIs.
As a reminder, currently, we use the mask to calculate the centroid. We then center the ROI using this centroid, which is fed to the classifier.
I tried using the correlation projection to calculate the centroid, which would weight each pixel according to the intensity in the correlation projection, the idea being that the soma has greater intensity than the rest of the ROI which would pull the centroid closer to the soma.
I found that compared to the mask, it helped. Although the centroid was still too far from the center of the soma in cases of a long dendrite. I then experimented with zeroing out pixels less than a certain brightness quantile. This effectively pulled attention to the soma and worked well.
The following plot shows the centroids I tried: using mask, using correlation projection with no filter, and using correlation projection filtering out pixels dimmer than brightness_quantile
. These examples were taken as the most extreme cases of soma far from the center, for which I used the bounding boxes collected in #12. The masked and filtered (zeroing out dim pixels) centroid is the best in all cases. I'm also showing the classification scores.
Yellow: baseline, using mask to calculate centroid
Blue: using masked correlation projection
Red: Only pixels brighter than .8 quantile in masked correlation projection
This looks good for these large processes, however when I applied the filtering of correlation projection dimmer than the .8 quantile to all ROIs, I got unexpected results. The classifier is extremely sensitive to position, and showed big changes in predictions when the position just moved by a very small amount. Since the model was trained on data centered using the mask centroid, it must have learned a pattern to expect some centering of the ROI and when it changes just by a tiny bit it throws off the classifier. This is not good behavior in my opinion, as the classifier should be robust to small changes in position of the input.
Plot showing change in classification score when position changed a tiny bit due to a recalculated centroid. Also note that the centroid is better for non-large-process ROIs when we use the mask. Colors are same as above.
Because the centroid is better when we zero out dim pixels only for large processes, I looked at the distribution of bounding box centers calculated in #12 to find a cutoff to determine when the soma is at an outlier distance from the center.
I picked 12 as an extreme outlier.
The following plot compares baseline with new logic for the top 10 biggest changes to classification score. Note that only the first 2 show significant changes, meaning that this change only effects very few ROIs.
Overall CV performance on SSF only: baseline: .935 precision, .974 recall new logic: .936 precision, .979 recall
Sorry. I'm not clear what the horizontal axis is in that histogram. To what statistic are you applying a cutoff in the new logic?
We should be careful about relying on the contrast in the correlation projection image for this logic. Recall: the contrast in the correlation projection changes dramatically when we adopt fine-tuning
I know we haven't converged on a fine-tuning pipeline, yet, but I suspect that this behavior will prove to be real. Fine-tuning suppresses noise in the background, which means the background pixels become as correlated with each other as the ROI pixels are.
I'm also concerned about your statement that "the classifier is extremely sensitive to [ROI] position." Do we have a plan to retrain the classifier so that it is not so sensitive? I worry that, whatever scheme we come up with for centering the thumbnail on the soma, we will inevitably come up against a cell that is so intricate that it ends up being mildly off-center in the thumbnail.
@danielsf
Sorry. I'm not clear what the horizontal axis is in that histogram. To what statistic are you applying a cutoff in the new logic?
I took the centers to bounding boxes which I manually annotated around ~650 soma and calculated the distance to the center. This is the distribution. The cutoff is based on how distant a centroid is to the center using the correlation projection as weights.
I'm also concerned about your statement that "the classifier is extremely sensitive to [ROI] position." Do we have a plan to retrain the classifier so that it is not so sensitive? I worry that, whatever scheme we come up with for centering the thumbnail on the soma, we will inevitably come up against a cell that is so intricate that it ends up being mildly off-center in the thumbnail.
If you look at the last plot you can see that the model is sometimes robust to changes in position. I'm not exactly sure what causes it to be robust sometimes and not robust other times. Overall though, using the logic which I've implemented to prevent a recalculation of the centroid if the centroid is not an outlier distance from the center, improves overall performance.
I'm concerned about it too as I would expect it to be more robust to changes in position; the issue is complex though, and might require more layers with more max pooling layers (which I tried but didn't adopt because overall it reduces performance). So this is a nontrivial problem.
@danielsf if we expect the soma to have higher intensity than the dendrite using the correlation projection after finetuning then this logic would still work to improve the centroid calculation for the soma.
Similar to #10, #11, #12.
We currently calculate the centroid of the mask in order to center the ROI. The problem is that this centroid will be pulled towards the dendrite if the dendrite is long.
We can try to exploit the fact that the soma will have higher intensity in order to weight the centroid towards the soma.
Idea: use the correlation projection / max projection as the centroid weights.