Open HanClinto opened 2 months ago
Hi Clint, we train models and sampling episodes using 32 A100 GPUs with CUDA 11.0. The time spent during training depends on how many GPUs you have... If you have 8 A100 GPUs, it will take 3-4 hours to finish one epoch. But the sampling process will take a lot of time, probably 1-2 days.
I'm curious about the hardware and time requirements for reproducing the paper's results. What sort of hardware did you use, and how long did it take to train each epoch?