Twin RHO Model Step 1: create the Twin RHO Model

XianzheMa commented 2 weeks ago

This PR is the first PR to implement another way of producing holdout set, il model and irreducible loss (typically suitable for small datasets):

Split training set into two halves; train two IL models, each on one half. Each model provides the irreducible loss for samples that it was not trained on. The main model is still trained on the original training set (CIFAR10, CIFAR100, CINIC-10).

Our current architecture only allow one trigger id to correspond to one model id. To accommodate two il models within one trigger, I create a "twin model" which internally consists of two il models. During training, each il model will memorize the sample ids it has seen, so that during evaluation each il model will be used for the samples the model hasn't seen.

How it works

At selector, RHOLossDownsamplingStrategy randomly samples half of the training set and mark the used column in selector_state_metadata table of those samples as True. The strategy issues a request to train a RHOLOSSTwinModel on this TSS. (unimplemented)
At trainer server, RHOLOSSTwinModel is instantiated. Only the 0th model is trained on this dataset (implemented in this PR).
At selector, RHOLossDownsamplingStrategy produces the other half of the training set by selecting the samples with used==False. The strategy issues a request to finetune this twin model. (unimplemented)
At trainer server, RHOLOSSTwinModel is instantiated again. Only the 1th model is trained on this dataset (implemented in this PR).
At selector, (optionally) clear the used flags.
At trainer server when training the main model: nothing needed to change as the logic is handled internally in the twin model.

Apparently it is not the optimal way to train a twin RHO model, but it's a very straightforward way and we can optimize it depending on how well it performs.

Current drawbacks

Due to used RHOLoss will currently be not compatible with some presampling strategies that also use used fields such as FreshnessSamplingStrategy.

Next PR

Implementing step 1 and 3: preparing the split holdout set.

How to review

All the main logic is in modyn/models/rho_loss_twin_model/rho_loss_twin_model.py

github-actions[bot] commented 2 weeks ago

^{( % to main)} ^{( % to main)}

codecov[bot] commented 2 weeks ago

Codecov Report

Attention: Patch coverage is 97.36842% with 2 lines in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (59ea026) to head (2992df2).

:exclamation: Current head 2992df2 differs from pull request most recent head cddc9ea

Please upload reports for the commit cddc9ea to get more accurate results.

Files	Patch %	Lines
...ig/schema/pipeline/sampling/downsampling_config.py	83.33%	1 Missing :warning:
...pling_strategies/rho_loss_downsampling_strategy.py	87.50%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #547 +/- ## ========================================== + Coverage 82.84% 82.92% +0.08% ========================================== Files 220 221 +1 Lines 10235 10298 +63 ========================================== + Hits 8479 8540 +61 - Misses 1756 1758 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

eth-easl / modyn