ejm714 commented 2 years ago

This PR fixes the issue that caused the following error:

RuntimeError: Early stopping conditioned on metric val_macro_f1 which is not available. Pass in or modify your EarlyStopping callback to use any of the following: train_loss

The issue has to do with the split column. When we don't provide it in the labels, we use a random split. But we don't guarantee that we have all of the values from the split_proportions dict (e.g. train and val ) in the split column. With a small number of videos and the default proportions, all videos getting assigned to train (see in splits.csv). Since there are no val videos, there is no val_f1_score.

I considered implementing this without the random draw and instead calculating the number of samples that need to go into each split based on the proportions provided. However, this gets tricky to ensure based on rounding. I instead went with the while loop that tries different seeds until it finds a draw that works. If someone specifies split proportions of train: 100, val: 1, and only provides say 5 videos, there is risk of an infinite loop. I'm not sure how important it is to catch such a situation since 1) we don't expect people to be training on so few videos and 2) it feels hard to accidentally end up in such a situation.

Bonus fix: fix bug in ModelCheckpoint which lets monitor be None if there is no early stopping. mode must either be min or max. I set it to min to match PTL's default but it doesn't get used if monitor is None.

Plus cleanup to set defaults in DummyTrainConfig to avoid repeated code.

github-actions[bot] commented 2 years ago

🚀 Deployed on https://deploy-preview-169--silly-keller-664934.netlify.app

codecov[bot] commented 2 years ago

Codecov Report

Merging #169 (ad38de6) into master (62797d9) will increase coverage by 0.0%. The diff coverage is 100.0%.

@@          Coverage Diff           @@
##           master    #169   +/-   ##
======================================
  Coverage    85.0%   85.0%           
======================================
  Files          30      30           
  Lines        1843    1851    +8     
======================================
+ Hits         1567    1575    +8     
  Misses        276     276

Impacted Files	Coverage Δ
zamba/models/model_manager.py	`84.1% <ø> (ø)`
zamba/models/config.py	`97.1% <100.0%> (+<0.1%)`	:arrow_up:

ejm714 commented 2 years ago

@pjbull ready for another look. this now has the added benefit of doing split allocation within species which is something we had desired but hadn't yet implemented. this should help ensure better training results (in addition to fixing the bug)

ejm714 commented 2 years ago

@pjbull ready for reals this time!

drivendataorg / zamba

Ensure videos are allocated into all specified splits #169

Codecov Report