bitextor / bicleaner-ai

Bicleaner fork that uses neural networks
GNU General Public License v3.0
38 stars 4 forks source link

Add gradient accumulation option for bicleaner-ai training #27

Open radinplaid opened 1 year ago

radinplaid commented 1 year ago

The wiki suggests a batch size of 128 is recommended for 'stable training'.

It would be helpful to have the option to accumulate gradients so that bicleaner-ai training with larger "effective batch size" were possible on GPUs with a relatively small amount of RAM.

Fairseq calls this option "--update-freq" Sockeye calls this option "--update-interval"

ZJaume commented 1 year ago

Hi @radinplaid, I agree and I've been thinking of it since I did the tool. Unfortunately Tensorflow does not support it natively, so it would require us to replace the tensorflow training loop function with our handmade function. Maybe at some point I'll will have time to implement it. I'm gladly to accept PRs if someone wants to write it.