Examine relationship between Feature Vector Segmenter and ROI seeder

danielsf commented 3 years ago

The results of segmentation appear to be very sensitive on the seeding step (i.e. the step in which pixels are selected as starting points for candidate ROIs). Currently, the seeding infrastructure we use ranks all of the pixels in a metric image according to their brightness and keeps the brightest N% of them as potential seeds, ignoring pixels that have already been incorporated into ROIs. We have not done a rigorous investigation of what value of N gives the best results. Nor have we considered alternative seeding strategies (i.e. taking all of the pixels that are N-sigma brighter than the population of pixels either in their neighborhood or in the metric image as a whole).

Tasks

[x] Explore the parameter space of the existing seeding infrastructure. Determine what parameters give the best results (where "best" means "including the fewest obviously bogus ROIs")
[x] If the results from the existing seeder are not satisfactory, consider other statistics for selecting pixels are ROI seeds.

Validation

[ ] Make a recommendation for how best to select seeds for the segmenter. The recommendation may be different for the Feature Vector Segmenter and HNCcorr.

danielsf commented 3 years ago

As an example of an experiment that could benefit from improvements to the seeding step, here is the current performance on ophys experiment 806928824 (it is possible that work in #288 could also improve this result)

The FVS and HNCcorr segmentation pipelines find a dense population of likely bogus ROIs that the legacy segmenter does not find.

danielsf commented 3 years ago

For more context:

When attempting to run FVS out-of-the box on non-SLC experiments, I experienced many timeouts. This was ultimately because recent changes to the FVS make it possible for the seeder to pass it a seed pixel that has already been masked out as part of an existing ROI. This can be fixed by increasing the seeder_args.minimum_distance parameter so that candidate seeds are more spaced out in the field of view, however, this raises concerns that there may be other race conditions in the code related to the relationship between batch size provided by the seeder and number of processors available to FVS.

Furthermore, running FVS on experiment 1048483611 with the parameters that were found to be best for SLC experiments results in a segmentation which, includes nearly half (115745 of ~260000) pixels in an ROI. I will generate a plot of the ROIs and post it below.

danielsf commented 3 years ago

This is the result of segmenting experiment 1048483611 with the following parameters

SINGULARITY_TMPDIR=${TMPDIR} singularity run \
    --bind /allen:/allen,${TMPDIR}:/tmp \
    ${image} \
        /envs/ophys_etl/bin/python -m ophys_etl.modules.segmentation.modules.fea
ture_vector_segmentation \
    --video_input ${video_name} \
    --graph_input ${graph_name} \
    --n_parallel_workers 88 \
    --seeder_args.keep_fraction 0.1 \
    --seeder_args.minimum_distance 95 \
    --seeder_args.n_samples 320 \
    --attribute filtered_hnc_Gaussian \
    --roi_output ${roi_dir}/deep_denoised_filtered_hnc_Gaussian_${suffix}_rois.j
son \
    --seed_output ${roi_dir}/deep_denoised_filtered_hnc_Gaussian_${suffix}_seeds
.json \
    --plot_output ${roi_dir}/deep_denoised_filtered_hnc_Gaussian_${suffix}_plot.
png \
    --seed_plot_output ${roi_dir}/deep_denoised_filtered_hnc_Gaussian_${suffix}_
seeds_plot.png

danielsf commented 3 years ago

Having looked more closely at how the Feature Vector Segmenter behaves when the seeder parameters are varied, I do not believe that the seeder is the correct step at which to weed out false positives from our set of candidate ROIs. I think the filter phase, specifically a z-score-over-background filter as illustrated in PR #317 is the correct way to limit the effect of false positives on our results. I will illustrate my thinking below.

Here is an example of an experiment that is plagued by too many false positives.

806928824_0.1_plot.png

We can reduce the number of false positives by reducing the "keep_fraction" parameter (the fraction of pixels that the seeder keeps for consideration) from the default 0.1 to 0.01 like so

806928824_0.01_plot.png

However, that parameter adjustment has a deleterious effect on the performance of the segmenter on other experiments. Here is a more straightforward experiment with keep_fraction=0.1

785569447_0.1_plot.png

Here is that same experiment with keep_fraction=0.01

785569447_0.01_plot.png

In this case, the lower value of keep_fraction causes us to miss faint ROIs that are very likely cells. The plots accompanying PR #317 illustrate that we can have better success filtering out false positives without degrading performance on high signal-to-noise experiments by filtering ROIs after segmentation. I recommend we do not change the seeder at this point.

AllenInstitute / ophys_etl_pipelines

Examine relationship between Feature Vector Segmenter and ROI seeder #287