Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
734 stars 89 forks source link

Problem about Running Time on Dialogue dataset #24

Open zhiyuanhubj opened 1 year ago

zhiyuanhubj commented 1 year ago

Hi, thanks for your great work.

I am conducting the dialogue related experiments. Actually, I don't find the estimation about time for dialogue tasks(There are the discriptions and issues about QG and QQP). Additionally, it seems that the model should be trained at 140k steps, which means it may excute lasting 5+ days even using around 4 A100.

Would you like to share more detailed experience about the GPU resource settings and running time in different tasks? I supposed it may be the issue we should optimze.

Thanks

summmeer commented 1 year ago

Hi, the time you estimate is close to ours, with 4 80G A100 GPUs. Using FP16 could save training time (we didn't implement this in the current version of code).