eth-easl / modyn

Modyn is a research-platform for training ML models on growing datasets.
MIT License
25 stars 3 forks source link

Parameterize `drop_last` #550

Closed XianzheMa closed 3 months ago

XianzheMa commented 3 months ago

The reason for this PR is that, for il model training, I don't want to drop the last batch, as the holdout set is already super small (and on il model training no sampling is used, so it does not harm not to drop last batch). But for main model training, usually we default drop_last to True.

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 82.84%. Comparing base (72ffdca) to head (9804c85).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #550 +/- ## ======================================= Coverage 82.83% 82.84% ======================================= Files 220 220 Lines 10232 10235 +3 ======================================= + Hits 8476 8479 +3 Misses 1756 1756 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 3 months ago

Line Coverage: -% ( % to main) Branch Coverage: -% ( % to main)

XianzheMa commented 3 months ago

The changes look good but cause a problem in the integration test. When this is fixed, feel free to merge

The failure is because we have a _assert_data_size logic added when merging the batch accumulation. This assertion happens regardless of we are doing batch accumulation or not. When we set drop_last to False, naturally the last batch has fewer samples and does not conform to the assertion. Let's hope this time CI passes