LAMF ophys motion corrected h5 file show extensive uncorrected motion

mattjdavis commented 2 years ago

Describe the Issue Learning mFISH mice have .h5 motion corrected movies are clearly not well motion corrected (See examples below). I have posited two possible causes.

Hypothesis 1: Pipeline processing error Curiously, the suite2p_motion_corrected_preview.mp4 movie indicates that motion correction was actually applied to the data at some point in the pipeline. Perhaps the h5 output was erroneously created with raw data. Thus, sur average projections and max projections are affected and looking blurry/low snr. Unclear if the segmentation was run on raw of motion corrected data.

Its possible this is restricted to new pipeline development work, which was ran a few of the LAMF mice.

Hypothesis 2: low SNR data harder to correct, params need to be adjust add additional context

Expected behavior Motion corrected .h5 outputs should be motion corrected

Screenshots See examples here. These are 1000 frame movies samples from the "suite2p motion corrected h5" and saved as tiff stacks. //allen/programs/mindscope/workgroups/learning/mattd/motion_correction_check

Scope n/a

Additional context n/a

TODO

Add issue to ophys_etl_pipelines
Generate more examples tiff movies
identify scope (how many mice/experiements/projects/just LAMD?/time frame)

mattjdavis commented 2 years ago

In conversation with @jkim0731 , we discussed several parameters to tweak in case Hypothesis 2 is corrects

If low SNR, enable "two step registration*
Smooth sigma and smooth sigma time could help.
The visual behavior whitepaper mention a specific strategy adjusting the 'maxregshift' parameter to .2 and throwing out rois that shift alot. It would be good to understand this process better.
maxregshift .1 was adequate in very limited testing on the lamf gold mouse

matchings commented 2 years ago

@morriscb we will be looking into these motion correction issues at the QCathon today to get a better idea of the scope of the issue, but I am wondering if you could also look into the lims pipeline implementation details just to sanity check that the saved motion corrected movie files do in fact have the motion correction output applied as expected. As Matt described above, we are seeing somewhat different things when looking at the motion_corrected_preview.h5, which typically looks like the correction worked well, compared to the data in the actual motion corrected movie .h5 files, which appear to have uncorrected or residual motion.

Is it possible that the input to segmentation and subsequent pipeline steps (including creation of the motion_corrected_preview.h5) is properly corrected, but the .h5 file of the full movie that is saved doesn’t actually have the correction applied for some reason? Or is that .h5 file literally the input to all subsequent steps, such that we can be sure that the contents of that file are the ground truth for what the rest of the pipeline sees? Maybe we are just being deceived by the downsampling in the motion_corrected_preview.h5?

morriscb commented 2 years ago

Hi both,

@mattjdavis Can you send me the IDs of some of the experiments you're looking at? I want to take a look at the summary plots that the code outputs as well as check out the motion corrected files on disk. I'm not sure how the movie looks fine but the output file does not as they are both taken from the same data.

To reply to some of the suggestions:

We can't currently use two-step motion correction as implemented by suite2p. The issue is that the shifts that suite2p reports that we store in the csv file end up only being for the second round of motion correction both steps. This means that we won't be able to reconstruct the motion corrected movie from the raw and some of our internal pipeline QC post-processing won't work.
There is a setting in the ophys_etl.suite2p_registration module that will attempt to automatically find a best smooth_sigma/_time values via a grid search, running the motion correction at each grid point. It's called do_optimize_motion_params. It's not enabled by default as it adds a significant amount to the run time, but if you run off LIMS on a failing experiment, you can check if the default motion parameters are still sudo-optimal or need to be changed. It decides on the smooth values by estimating the sharpness of the image via the acutance for each grid point and selecting the max acutance values.
If you think that the max_reg_shift needs to be raised I would first check the failing raw movies for any motion you could perceive that is greater than 10% of the field of view. I think if you can find an obvious example by eye and then test that exp off LIMS. If you see an improvement in the image then you should be good to go. I think the important thing to keep in mind with this value is it has a physical meaning (the max brain or rig movement). We need to be able to say we can reasonably expect the motion to get that large.

morriscb commented 2 years ago

Hey @mattjdavis, I ran through one of the experiments from the set of example experiments that showed lower quality motion correction. The experiment number is 1171053619. You can find the output of this new run here: /allen/aibs/informatics/chris.morrison/ticket-299. The previous output form the LIMS run can be found here: /allen/programs/mindscope/production/learning/prod0/specimen_1155072218/ophys_session_1170938336/ophys_experiment_1171053619/. The difference of the run I did is that I ran through 28 difference combinations of smoothing parameters. The one selected was different (smooth_sigma=1.15, smooth_sigma_time=0.0) than the current default (smooth_sigma=1.65, smooth_sigma_time=2.0). The new run does look a bit better to me from looking at the average image and the movie of the motion correction, though the graph of the two motion correction summary plots are largely the same. I'm curious as to if the issue you previous found has been fixed with the run I did. I've stored the log for my run in the file, adaptive_output.out.

mattjdavis commented 2 years ago

Thanks Chris! Awesome to see a the parameter search result. I agree, the averages look better in the new runs (I see two now). I am going to double check the motion corrected files.

Seems the critical param is smooth_sigma_time. Its curious that 0 is working better here than our previous pipeline default (2). The suite2p docs mention that 1 or 2 for this value may be necessary for low SNR (which I think this dataset is low SNR.

Im thinking a reasonable next step is to scale up the test to a medium level (say 20-30 experiments), before we unleash on all the data. Thoughts?

jkim0731 commented 2 years ago

Maybe the improvement with smooth_sigma_time depends on the frame rate? If the frame rate is fast enough to assume monotonic drift (slowly changing in only one direction) within the smoothing window higher smooth_sigma_time would help the motion correction, but if the frame rate is slower then the motion could happen in multiple different directions over the smoothing window, making the correction worse. It matches to my impression about the current version of motion correction results, where overall it seems OK (especially after averaging some frames) but there were detectable motion in frame-by-frame observation.

morriscb commented 2 years ago

Hey Matt. Yeah, I ran through a second one to that was marked "10" for large motion. It motion corrected just fine. The way I phrased the 10% shift I realized was slightly wrong as it's actually a total of 20% of the FoV that would be allowed around a center point (10% left -> 10% right for instance). The movie in this case motion corrected just fine.

"scale up the test to a medium level (say 20-30 experiments)" That's something we could try if folks want. I would suggest collecting a set of experiments that span signal to noise to try and find a "best" set of parameters for these data. Unfortunately running the search over parameters takes too long to run in LIMS and changing the defaults for processing likely would just mean we change them again for a future experiment. To be clear, finding the best values was done by processing the full movie several times (28 by default), increasing the runtime by roughly that same amount. It's not that bad as it's not writing data out (as in it takes like overnight-ish to run). This code path was done fairly quickly during the SSF work so we haven't optimized it by say running on a smaller dataset that can estimate the result over the full movie. I think that later point would require some development/testing as to how to pull a representative subset. For example, I have seen that some of the LeraningmFish data has trends in signal to noise over the course of the movie so if we just "pulled from the middle of the movie" we would miss this change in SNR.

@jkim0731 Maybe, though I would say it's at least a combination of SNR and frame rate. The data the current default was set on was sampled at 6Hz (vs this data's 10 it seams?) and tended to prefer a higher value of smooth_sigma_time.

mattjdavis commented 2 years ago

@morriscb Looking at the data from the two runs, it seems like running the parameter search on a larger set wouldn't add too much information. Clearly the time parameter is most important, and likely setting to 0 is the way to go. The variance in the other param is so small that it is likely not going to matter much for the actual frame correction. If other folks feel different please comment.

Agree, running on smaller parts of each movie to estimate would require more dev work.

Therefore I suggest running the following ids with, with smooth_sigma_time = 0, and smooth_sigma = pick value . Hopefully these are "good enough" parameters that will work for most of the current dataset. I can compare the acutance value to the existing registration runs and see if we get a boost on the scale observed in the plots above. (may be a few duplicates)

1190190096 1165168630 1161494700 1183206298 1171800509 1174074412 1184388022 1156990807 1184841974 1191979392 1167705432 1160406119 1189514810 1160406121 1172483760 1165993826 1174074410 1172017118 1182757436 1160014348 1188653073 1154288461 1131891153 1181944238 1192773607 1165168627 1132811394 1131723119

Seems like we may need the ability to do project specific parameters in the future (example learning mfish vs SSF). Especially when big aspects of the data change (like frame rate). How these params are determined will need to be defined as a separate process.

mattjdavis commented 2 years ago

Also, is it possible to get permissions for /allen/aibs/informatics/chris.morrison/ticket-299? I was trying to open the corrected h5 (or copy to another location) to look at the individual frames and got permission denied (user: matt.davis)

morriscb commented 2 years ago

Sure. Sorry, I don't know why some files were read all and some weren't. Should be fixed.

morriscb commented 2 years ago

Okay, most of the data has finished processing (only one experiment is left). The outputs for each experiment are in appropriately named folders in /allen/aibs/informatics/chris.morrison/ticket-299. I ran them with grid search still on as I'm really only worried about the run time if it was used in production. As of now, I don't think there is a way to set a specific set of motion correction parameters for a given project/experiment so if we wanted to change the runs that happen in LIMS, we'd have to change the defaults for all data. A portion of the experiments are done with a few failing last night due to infrastructure issues.

I looked through a few of the results and found that the while none of the data so far prefers smooth_sigma_time>0 there are a few experiments where that are sensitive to the value of smooth_sigma with acutances varying by about a bit more than 10% in some cases. Not sure how much this translates to final motion correction quality but it is getting to the level of the differences when changing smooth_sigma_time>0 that we could see previously as "worse". I'll ping once the rest of the data runs.

mattjdavis commented 2 years ago

Thanks Chris! Yep I see some of the finished files. Awesome, it will be nice to have extra data from the grid search. I can load up results and see whether the variance in smooth_sigma is associated with other aspects of the data, as well as sampling individual frames to get a sense if there is any residual motion.

mattjdavis commented 2 years ago

Results of medium scale param grid search on LAMF data

Here we see that smooth_time_sigma=0 is indeed best for all expt_ids

Here I fix smooth_time_sigma=0 and see what the variance is for the other param smooth_sigma. Most have low variance but a few expts are high variance. It does look like smooth_sigma = 1.15 is generally best

Standard dev of acutance (when smooth_time_sigma=0). A few expts do indeed have high variance

high variance experiments

This table is the acutance + max normalized acutance for each experiment, with these param values. smooth_sigma = 1.15 smooth_time_sigma=0 where acu_norm = 1.0, is the max acutance value for the whole parameter grid search (100%). Notice how most experiments are at 100%, and a few (the higher variance expts from above, are still greater than 95%.

I conclude that smooth_sigma = 1.15 & smooth_time_sigma=0 are good enough (and most often the best) param values for the LAMF dataset.

TODO:

see if the high variance expts correlate with other aspects of data (cre line, depth, etc) @mattjdavis
discuss with @morriscb @matchings @jkim0731 if we should use these params and reprocess the entire dataset

mattjdavis commented 2 years ago

@morriscb could you open permissions for /allen/aibs/informatics/chris.morrison/ticket-299 again?

morriscb commented 2 years ago

Huh, don't know what changed to stop it. You should have it now.

mattjdavis commented 2 years ago

@morriscb After discussions on our team, it seems that rerunning the pipeline (registration + segmentation/extraction/etc) on the all the Learning mFISH data is the next step. Will you need a list of experiment IDs to facilitate this?

smooth_sigma = 1.15 & smooth_time_sigma=0 seems to be the way for this run.

Thinking long term, you mentioned that adaptive motion correction is likely intractable for pipeline or large scale reruns of the data. So we will likely have to have a strategy for finding the right parameters when new projects with different data characteristics (frame rate, acquisition parameters) are introduced. Possibly do what we did here (grid search on small set of experiments) And store project specific params for the pipeline to use. I would be happy to contribute to those conversations in the future.

matchings commented 2 years ago

From my perspective, everything inLearningmFISHTask1A project code is ok to reprocess.

morriscb commented 2 years ago

Hey @mattjdavis. Glad the results are satisfactory when the smoothing parameters are changed.

The adaptive motion correction is the grid search we just did. Right now it runs over the full movie to compute the acutance of the average and decide on a best set of smoothing parameters based on which is the sharpest. If we want run this in production we'll have to do a bit of work to convince ourselves that if we run on a subset of data, we can reproduce the same relative results. Loading the full movie, processing it, and then doing that several times is the issue here. I've started on this by updating the code surrounding the grid search and am currently waiting on a code review there. Once that's done, I can start testing how effective say grabbing chunks of frames at the start, end, and the middle are at reproducing the same "best" smoothing. That's at least my plan for now.

One of the reasons I'm pushing to work on the above is that (as far as I know) there is no way to change the config parameters in LIMS for a specific experiment. In order to change the smoothing parameters for this projection, we would have to change the global defaults for all projects and be left in the same situation again when other data is re-run or new data is uploaded. Hence the push to add get the adaptive/grid search smoothing up and running in production. The only way I can see to do it for a single project would be to create a new LIMS QUEUE for the purpose which seems a bit overkill.

mattjdavis commented 2 years ago

Hi @morriscb, testing subsets of the movie for adaptive motion correction sounds awesome! I understand the issue with project specific params with the LIMS queue system. I'm just checking in on this issue now, being back from break and summer workshop. At some point soon we will need to get all the data rerun through the pipeline, as all our downstream analysis are blocked by the having contaminated motion. Thanks!

matchings commented 2 years ago

@morriscb I understand that lims can’t handle project or experiment specific parameters in the main config settings, but is is possible to do a one-off run of motion correction on a specific set of experiments with a specific set of parameters, without changing the global config? Similar to when you have “re-run” processing for a batch of learning project datasets in the past (like with suite2P segmentation)?

I am asking because we have a goal to fully validate the dataset and all the processing steps by the end of the year (which is rapidly approaching), and the motion correction failures are blocking all of this, so it’s pretty critical that we re-run motion correction with the new parameters ASAP so that we can move on to subsequent processing steps (ex: suite2P segmentation and segmentation classifier), even if it’s a one-off thing.

If this is possible, please go ahead and re-run everything in LearningmFISHTask1A and LearningmFISHDevelopment with the updated parameter set that you and Matt figured out via the grid search.

I understand that any new data that is acquired would still be processed using the old params (at least until your plan to do adaptive grid search on a per experiment basis is put into place), so we may have to do this reprocessing once more in the future, but we can wait a few months until another large chunk of data is collected to avoid having to do this frequently.

morriscb commented 2 years ago

Hey @matchings, I just got back from vacation and ready to get back to this.

Thanks for clarifying. Unfortunately, I don't think there is a way to do what you describe above. When I've "re-run" processing, it was with the master configurations for all pipeline steps at the time. There isn't a way to just run a special set of processing in the way we need without fully creating a new LIMS queue.

I chatted with the other PIka's and I'd think we are happy to change the default configuration for this processing for the time being. This is assuming that there aren't other projects that are running the ophys pipeline currently. I think that would be a question for you, Marina. Is there currently another project that is submitting that we may interfere with if we change the defaults? If not, then I think we can change the defaults and get things running. I'll change the defaults and relaunch the processing once I hear back from you.

That said, while you all were at the workshop, I was testing out the adaptive motion parameter code and was able to get it into a state where things seems process quickly enough to be run in LIMS. I haven't full checked all the results yet, will do so over the next few days, but hopefully this will be the last time we have to change the defaults for a specific project.

Hope SWDB went smoothly!

morriscb commented 2 years ago

Hey both,

Based on a conversation with Marina and Pika over email it looks like we can move forward with with re-running the LearningmFish experiments through motion correction and beyond. I've updated the defaults in the ophys_etl_pipelines code. Just want to give warning and confirm with you that this will end up changing the ROI and cell_specimen ids upon re-submission of the jobs to LIMS. I've found the list of experiments and containers that Marina sent to me previously. Wanted to check in quickly before I submit the jobs and that @matchings you were still okay with re-running as mentioned above.

matchings commented 2 years ago

Hey @morriscb, I asked around about what other datasets might be affected by changing the motion correction params, and it sounds like you might also be doing some reprocessing of MultiscopeSignalNoise and/or TaskTrainedNetworks datasets right now? If so, it might make sense to wait to make any changes until after those datasets are finished, so that we don't mess with anything there.

There are also some Openscope datasets that could be affected so we might want to do an off pipeline test of the new parameters with some of those to make sure that it wont cause problems there either. Or perhaps all project codes that are actively collecting data should simply be re-run through motion correction once the adaptive parameter search is implemented, so that everything is consistent. We don't want to create a scenario where one half of a dataset was run with some parameters and the other half with another. In any case, I think this warrants a broader discussion, so I will bring it up at the TRAC meeting tomorrow.

In general, I think we need a way to track what version of processing code was run and what parameters were used on any given experiment so that we can evaluate these things and directly link the outputs to the run settings. Is there a way to associate this information with the processing output in such a way that it is always traceable? Maybe this already happens and I just didn't know?

morriscb commented 2 years ago

Hey Marina, cool. Good to know. For full context, we changed the smoothing parameters previously when we were working on SSF with only input from the team so the configs have changed underneath folks once already. I've already merged the latest change to the parameters so perhaps we'll have to re-run Openscope at some point. The issues with MultiscopeSignalNoise/TaskTrainedNetworks are after motion correction so hopefully we won't have to re-run them. Config parameters I don't think are tracked anywhere centrally in LIMS but it is available next to the motion correction file in the output from the SLURM jobs. I'll submit the motion correction jobs shortly for LearningmFish unless there is any objection. My plan is to resubmit all the experiments for the project.

matchings commented 2 years ago

Thanks Chris, that all sounds good. Go ahead and submit the learning mFISH experiments. Here is the most up to date list of experiments to process:

learning_experiments_for_motion_corr_220914.csv

mattjdavis commented 2 years ago

Just to confirm, we intend to rerun all experiments in LearningmFISHTask1A and LearningmFISHDevelopment (which includes additional experiments/containers since last time). Do you need the experiemnt IDS or is project code sufficient? (EDIT: see marinas comment)

@matchings what about omFISHGad2Meso since we are look at 210 data now? (there are other omFISH project codes too, but maybe less relevant to us).

matchings commented 2 years ago

oh good point about omFISH data. We should probably re-run those as well. Here is a list of experiments to run that includes LearningmFISHTask1A, LearningmFISHDevelopment, and omFISHGad2Meso:

mFISH_experiments_for_motion_corr_220914.csv

mattjdavis commented 2 years ago

Do we know when the updated params will be pushed to LIMS production? Since we are currently collecting data, we may miss a few experiments. We can note those and submit the job again, when we have clarity.

morriscb commented 2 years ago

The updated params were pushed yesterday afternoon. When changing the code on ophys_etl_pipelines, it basically takes until the merged code produces a new Docker image that LIMS pulls for the processing.

Thanks for the updated ids! To answer your question Matt, the Reprocessing tab of LIMS requires IDs to be resubmitted not project names. In our case this means experiment or container ids. Speaking of which, can you send me an updated set of Container ids as well, Marina? Thanks!

matchings commented 2 years ago

@morriscb There should be an ophys_container_id column in that same .csv with the relevant list of container IDs to process

mattjdavis commented 2 years ago

Ah gotcha,

In this https://github.com/AllenInstitute/AllenSDK/issues/2532#issuecomment-1247208674 we just discovered some of the recent experiments are not in LIMS metadata experiment table that marina used to pull the list of experiments. so we will have to enter those manually, or rerun them later.

mattjdavis commented 2 years ago

We have spot checked the data after re running and are satisfied with the motion correction

AllenInstitute / brain_observatory_qc

LAMF ophys motion corrected h5 file show extensive uncorrected motion #73

Results of medium scale param grid search on LAMF data