Apply_model does not yield same outputs as the trained model

Hi,

First of all, big thank you and congrats for sharing this great tool with the community!

I finished training a keypoint-moseq model and applied this pretrained model on new data, but ran into some consistency issues when qualitatively inspecting the syllables in a few videos which raised doubts on whether I am applying the inference (apply_model) function correctly.

To understand better what was going on, I decided to run a simpler experiment (see below) that yielded inconsistent results comparing the output of the apply_model function with the extract_results function. Can you help me understand why these results differ and how I can robustly apply a previously trained kp-moseq model to unseen data?

Details of my experiment: I compared the syllable results for a single video that was also used for training my model (in step 1) 1) right after training the model: load the most recent model checkpoint, then call extract_results() and save results to csv model, data, metadata, current_iter = kpms.load_checkpoint(PROJECT_DIR, MODEL_NAME) results = kpms.extract_results(model, metadata, PROJECT_DIR, MODEL_NAME) kpms.save_results_as_csv(results, PROJECT_DIR, MODEL_NAME)
2) load the most recent model checkpoint, then run apply_model() on a single video of the training database (in a new inference_project_dir, re-using the same config.yml and the same DLC key points that were used for training the model): coordinates, confidences, bodyparts = kpms.load_keypoints(KEYPOINT_NEW_DATA_PATH, 'deeplabcut') data, metadata = kpms.format_data(coordinates, confidences, config()) model = kpms.load_checkpoint(MODEL_PATH, MODEL_NAME)[0] results = kpms.apply_model(model, data, metadata, INFERENCE_PROJECT_DIR, MODEL_NAME, save_results=True, results_path=os.path.join(INFERENCE_PROJECT_DIR, 'inference_results.csv'), config()) kpms.save_results_as_csv(results, INFERENCE_PROJECT_DIR, MODEL_NAME)
3) compare the result csv files from 1) and 2) using diff function on Linux (or just less/cat both files and inspect them next to another)

Unfortunately, the syllables estimated for this single video were significantly different between methods 1) and 2), e.g., yielding significantly different histograms (i.e., certain syllables not being found a lot in 1) appeared at high counts in the outputs of 2)). I also found that repeating inference step 2) yielded small differences between the estimated syllables between two inference runs for v0.2.3 (the syllable histograms didn't differ much but there were many smaller differences in the estimated syllables across the time points) and tiny differences with v0.4.5 (across time points), using the same config and DLC key points as inputs.

Can you help me understand the above results and guide me how to robustly apply a previously trained kp-moseq model to new, unseen videos?

Many thanks Joeri

Hi Joeri,

Keypoint-MoSeq learns syllables by fitting a Bayesian model. The syllable sequence that you get as output is (best case) a sample from the posterior distribution of that model. Even after the parameters are fixed (e.g. when applying a trained model), there is still a distribution over possible syllable sequences and therefore its expected that the precise sequence would differ from run to run. So that explains variation in results of the inference step.

On top of this inherent stochasticity, there is also the issue of whether the sampling algorithm converges to the posterior within the number of iterations that you are using. It's possible that running the initial fitting and especially the apply_model steps with more iterations would improve convergence and could make the outputs of steps (1) and (2) more similar.

dattalab / keypoint-moseq

Apply_model does not yield same outputs as the trained model #143