Improve top camera ID tracking using SLEAP on the quadrant cameras

anayapouget commented 1 month ago

After tests on Aeon 3 social 0.2, we found that a SLEAP ID model trained on quadrant cameras performs better than the current SLEAP ID model we are using trained on the top camera. This makes sense since obviously the quadrant cameras are more zoomed in, making the difference between the tattooed and not tattooed mice easier to pick up on. Below is a comparison of their performance on unseen videos (i.e., videos neither of the models were trained on, specifically 2024-02-25T18-00-00 and 2024-02-28T15-00-00):

Top camera SLEAP ID model: BAA-1104045 Accuracy: 0.917 BAA-1104047 Accuracy: 0.776 ID accuracy: 0.844 Total tracks: 33536 Tracks identified: 33513 Tracks correctly identified: 28286

Quadrant camera SLEAP ID model: BAA-1104045 Accuracy: 0.965 BAA-1104047 Accuracy: 0.837 ID accuracy: 0.897 Total tracks: 33519 Tracks identified: 28051 Tracks correctly identified: 22523

Maybe with some extra work on the model parameters we could make it even better? @lochhh it would be great to discuss this at some point if you have time!

As a result we have decided to make a new set of full pose ID data using quadrant camera SLEAP models for all arenas and social experiments. The steps we need to do to are outlined below:

[x] Generate composite videos to be used for training and evaluating the sleap models.
[x] Make SLEAP files and do manual labelling. If possible it would be super helpful if those who are familiar with SLEAP (just @jkbhagatio and @lochhh I think?) could assist in the labelling bit. Note that the individual sessions should already be labelled thanks to the Bonsai blob tracking, but still need to be checked because the transformation of the top camera coordinates to the quadrant cameras isn't exact and some points are off to the side of the mice.
[ ] Train the models using the same or similar set of parameters determined from my tests on Aeon 3 social 0.2.
[ ] Select frames from the composite videos set aside for evaluation and their corresponding top camera videos. Generate ground truth SLEAP files using the bonsai blob tracking and run inference on the frames using the newly trained quadrant camera ID models as well as the original top camera ID model. Evaluate their performance and compare.

If the performance of the quadrant camera ID model is consistently better than that of the top camera ID model as expected, continue on to the next steps.

[ ] Generate composite videos for the entirety of the social sessions. (COMPUTE HEAVY)
[ ] Create a new Bonsai workflow. If possible it would immensly speed things up to get @glopesdev's help on this because my knowledge of Bonsai and C# is minimal! It will need to:
- [ ] Take in as inputs: the path to a composite video (and/or to the corresponding top camera video? we will need both so either both are provided or one can be inferred from the other), the path to the csv that relates to composite video frames to the top camera video frames (again this could maybe be inferred), the path to the relevant quadrant camera ID model, the path to my top camera full pose model, the path to the relevant homographies, the path to the output directory.
- [ ] Run full pose inference on the top camera, one frame at a time.
- [ ] Run ID inference on the corresponding composite camera frames (it can be 1 or 2, depending on whether or not both mice are in the same quadrant camera frame).
- [ ] Convert the quandrant camera ID coordinates to top camera coordinates using the homographies. If no mice are detected for a frame or a quadrant camera frame is missing (becuase in some epochs some quadrant cameras failed to start up), the ID can possibly be pulled from the existing top camera full pose ID data.
- [ ] Match the identities to the poses (we can re-use my exisitng C# code for this).
- [ ] Save the pose data with the matched identities to a binary file in the output directory.
[ ] Create a python file that generates the correct SLURM scripts to run the bonsai workflow (we can re-use a lot of the code from my existing social bonsai sleap python file) and complete inference on the entirety of the social sessions. (COMPUTE HEAVY)

glopesdev commented 1 month ago

@anayapouget this looks great, happy to help with this. As a side note to investigate whether we could even run this potentially online, have you ever tried running the distributed version where you run a SLEAP model for all cameras separately and then stitch the resulting tracks (as opposed to stitching the video)?

That would be amenable to GPU parallelism and we could even try running the entire batch of 4 cameras into a single GPU call.

lochhh commented 1 month ago

Thanks for putting this together @anayapouget !

Maybe with some extra work on the model parameters we could make it even better? @lochhh it would be great to discuss this at some point if you have time!

Yep. Automated hyperparameter tuning (e.g. Optuna, Ray tune) is something we can look into as well.

Make SLEAP files and do manual labelling. If possible it would be super helpful if those who are familiar with SLEAP (just @jkbhagatio and @lochhh I think?) could assist in the labelling bit. Note that the individual sessions should already be labelled thanks to the Bonsai blob tracking, but still need to be checked because the transformation of the top camera coordinates to the quadrant cameras isn't exact and some points are off to the side of the mice.

We should be able to use existing CameraTop ID models to automatically label social session frames, which we can proofread, and then apply the same CameraTop-to-Quad transformation - this will be easier than manual labelling.

anayapouget commented 4 weeks ago

Yes @lochhh good point - I was planning on modifying the code for generating the labelled SLEAP files to use the DJ full pose ID SLEAP data soon. However we have some setbacks in generating the composite videos #442... We'll have to fix this before moving forward, although we can test automated hyperparameter tuning on my Aeon 3 social 02 dataset already if you'd like? That definitely sounds like it would be good to explore!

lochhh commented 3 days ago

@anayapouget:

I wouldn't use Aeon 3 social 02, because that was the first one I did and it's not done in exactly the same one as the others. You can use Aeon 3 social 03 or 04 though, and Aeon 4 social 02 - those are all good.

SainsburyWellcomeCentre / aeon_mecha

Improve top camera ID tracking using SLEAP on the quadrant cameras #440