Thanks for this wonderful work!
Could you please let me know your training time (e.g. at how many GPUS) and the total epoch number? Did you set an early stop scheme?
We train on a single GPU, about a day each round, 100k steps on synthetic data (though 20k would suffice), and 20k when finetuning, and no we didn't stop early or vary learning rate.
Dear authors,
Thanks for this wonderful work! Could you please let me know your training time (e.g. at how many GPUS) and the total epoch number? Did you set an early stop scheme?