Implementation of the gradient accumulation feature described in #64.
The validation of the method was done by comparing the embedding space with two different, semantically identical configurations (logs are in the issue). On top of that, I'll conduct one larger training run on the cloud GPU to see if everything runs smoothly.
Implementation of the gradient accumulation feature described in #64.
The validation of the method was done by comparing the embedding space with two different, semantically identical configurations (logs are in the issue). On top of that, I'll conduct one larger training run on the cloud GPU to see if everything runs smoothly.