Add precomputing of normal responses (only 1 model loaded at a time).

As per #137, this PR introduces a potential major gpu memory saving benefit, by loading a single reference model, precomputing the "pretrained/reference model" loss components of the normal loss for ALL batches, and unlearning on the same model.

This is increases the RAM usage of the unlearning (depending on unlearning sample size), but requires only a single model loaded to memory

Closes #137

Adamliu1 / SNLP_GCW

Add precomputing of normal responses (only 1 model loaded at a time). #142