Open radinplaid opened 1 year ago
Hi @radinplaid, I agree and I've been thinking of it since I did the tool. Unfortunately Tensorflow does not support it natively, so it would require us to replace the tensorflow training loop function with our handmade function. Maybe at some point I'll will have time to implement it. I'm gladly to accept PRs if someone wants to write it.
The wiki suggests a batch size of 128 is recommended for 'stable training'.
It would be helpful to have the option to accumulate gradients so that bicleaner-ai training with larger "effective batch size" were possible on GPUs with a relatively small amount of RAM.
Fairseq calls this option "--update-freq" Sockeye calls this option "--update-interval"