I am conducting the dialogue related experiments. Actually, I don't find the estimation about time for dialogue tasks(There are the discriptions and issues about QG and QQP). Additionally, it seems that the model should be trained at 140k steps, which means it may excute lasting 5+ days even using around 4 A100.
Would you like to share more detailed experience about the GPU resource settings and running time in different tasks? I supposed it may be the issue we should optimze.
Hi,
the time you estimate is close to ours, with 4 80G A100 GPUs. Using FP16 could save training time (we didn't implement this in the current version of code).
Hi, thanks for your great work.
I am conducting the dialogue related experiments. Actually, I don't find the estimation about time for dialogue tasks(There are the discriptions and issues about QG and QQP). Additionally, it seems that the model should be trained at 140k steps, which means it may excute lasting 5+ days even using around 4 A100.
Would you like to share more detailed experience about the GPU resource settings and running time in different tasks? I supposed it may be the issue we should optimze.
Thanks