As per #137, this PR introduces a potential major gpu memory saving benefit, by loading a single reference model, precomputing the "pretrained/reference model" loss components of the normal loss for ALL batches, and unlearning on the same model.
This is increases the RAM usage of the unlearning (depending on unlearning sample size), but requires only a single model loaded to memory
As per #137, this PR introduces a potential major gpu memory saving benefit, by loading a single reference model, precomputing the "pretrained/reference model" loss components of the normal loss for ALL batches, and unlearning on the same model.
This is increases the RAM usage of the unlearning (depending on unlearning sample size), but requires only a single model loaded to memory
Closes #137