glistering96 / AlphaRouter

2 stars 0 forks source link

Gradient accumulation #16

Open glistering96 opened 1 year ago

glistering96 commented 1 year ago

Will be good to be implemented

glistering96 commented 1 year ago

Refer to this:

https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255