AllenInstitute / ophys_etl_pipelines

Pipelines and modules for processing optical physiology data
Other
9 stars 5 forks source link

Investigate stopping condition for iterations in compute reference. #394

Closed morriscb closed 2 years ago

morriscb commented 2 years ago

After AllenInstitute/ophys_etl_pipelines#390, I played a bit with using different averaging schemes and stopping criteria for the compute_reference initial reference creation method. This ticket will attempt to formally incorporate those idea and in turn remove some of the magic numbers from the code. Namely, 8 iterations of reference image creation being hard coded in suite2p.

Timebox this effort so that decision is made by Wednesday morning at the latest.

morriscb commented 2 years ago

After investigating a variety of schemes for dynamic convergence criteria, we concluded that the gains we could make to the quality of the reference image currently were marginal at best (both visually and from the metrics of image quality we used). With this in mind and the fact this ticket blocks others, we've decided to keep the current algorithm in place with a minor fix to remove an incorrectly 1 indexed array from suite2p and simplify the data loading, making it faster. I'll summarize what was tried here.

I attempted to use statistics of the reference image created per iteration to define a convergence criteria for making a "good" reference image rather than a fixed set of iterations. The two I decided on were, the image acutance (the mean of the image gradient) and the image variance. Both turned out to be sensitive to noise in the image as evidenced by early iterations having high acutance and variance. This is due to them being composed of fewer frames than the final image. They could be used as indicators that the image had converged however as they tended to converge to stable values as iterations increased.

I also tried several weighting schemes and quality cuts to attempt to improve the reference image quality. These were: computing weighted averages using both cmax (the maximum correlation a given frame has with the current reference image) and cmax^2. For quality cuts to reject outliers, we applied cuts to cmax in the form of different sigma clips and percentile cuts.

The set of criteria I found worked best and are shown in the linked images were: weighting by cmax^2 in all averages and cutting cmax at either the median, mean or sigma clipping cmax to 1 sigma and using the lower bound of this clipping to apply a cmax cut. These are refered to as cmax2_50, cmax2_mu, and cmax2_clip1 in the plots below respectively. For reference, the current code applies a progressively relaxing cut per iteration, ending with a cut on the median of cmax.

Reference Images

Plotted are 20 low signal to noise experiments. The current reference creation code is labeled "standard" here. I also show the value of the acutance ("acu") for the final iteration. The images are extremely similar for each method.

Additionally, I plotted cmax, and the motion correction displacements ymax, xmax and their absolute distance, dist.

Correlation data

The solid histograms are the frames that pass their respective cuts on the last iteration. The dotted lines are values for the frames that are excluded. All cuts seem to do a good job at excluding outliers (low cmax and/or high ymax/xmax/dist).

I have preserved the code used in the investigatory portion of this ticket here in case we feel the need to revisit reference image creation.

For future work, we could investigate using cmax^2 as a weighting for creating the reference image without modifying the convergence criteria. Additionally, investigating metrics used for convergence criteria and image quality should be undertaken. This could be done by degrading at set of high signal to noise data where know close to ground truth to get a better sense of how these metrics react. This is likely a longer term project than this ticket allows hence closing it now.