YttriLab / A-SOID

An active learning platform for expert-guided, data efficient discovery of behavior.
Other
55 stars 7 forks source link

CalMS21 results not reproducible from CSV #101

Open N-Cissewski opened 1 week ago

N-Cissewski commented 1 week ago

Hello,

I have tried to reproduce the training results from using the NPY files of the CalMS21 dataset by using CSV files generated from the same dataset. To do so, i seperated the training data into 70 pose files and 70 One-hot-encoded label files for the 70 trials included in the original dataset. I loaded the files into A-SOID using the DeepLabCut option. The poses from the CSV load nicely, A-SOID is successfully able to recognize the two animals and different keypoints. The labels are also loaded in the same way as they would be loaded in the original NPY files. For comparison: This is the feature histogram as loaded from the original NPY file: Histo_NPY

And this is the feature histogram from the CSV files:

Histo_CSV

As far as I can tell, these are identical.

Beyond that I have chosen the same settings the NPY files use per default. (FRAMERATE = 30, LLH_VALUE = 0.6, ITERATION = 0, MIN_DURATION = 0.4, TRAIN_FRACTION = 0.01, MAX_ITER = 100, MAX_SAMPLES_ITER = 40, CONF_THRESHOLD = 0.5, N_SHUFFLED_SPLIT = None)

For the active training step I get the same number of Initial samples to train per class (attack [9.47]; investigation [98.42]; mount [19.05]; other [211.26]).

However, when actually running the training with the CSV files, the results are vastly different from what I get when using the Original NPY file.

Training plot from the NPY file: Training_NPY

Training plot from the CSV file with 28 input features: Training_CSV_28

Training plot from the CSV file with 20 input features: Training_CSV_20

The training graph is entirely different between CSV and NPY regardless of wether the CSV file uses the original data with 28 input features, or an alternate CSV that disregards the mice's ears to mimic the way A-SOID behaves when loading the original NPY.

Considering the behaviour up until the actual training is the same for both NPY and CSV, I'm assuming this is the step where the issue occurs. Any help would be greatly appreciated. Thank you in advance.

N-Cissewski commented 5 days ago

Update: It works now. The problem seems to have been missing likelihood columns for each keypoint in the pose file. It might be worth considering to add a warning if a pose file does not contain likelihood values for the keypoints.