Open Fredditeddy opened 4 months ago
Update:
eval_accumulation_steps does not work, since it accumulates all tensors on RAM.
What works so far is, not returning hidden_states and attentions.
However, I do not understand, why this is not an issue for the training loop.
I additionally added a Callback after each epoch to use torch.cuda.emtpy_cache() which seems to free the memory after the training loop.
Hi all,
I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:
I could not find a solution for 1.
For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.
Do you have an idea?
Keep up the great work.
Best wishes, Frederik