Closed hero-y closed 3 months ago
Hello @hero-y ,
We trained LGD on a cluster of three GPUs NVIDIA Quadro 4080 over 1000 epochs. Each GPU requires about 10-20GB RAM if I remember correctly. The training time was less than 3 days.
If it's scalable, could you provide a guideline on how the number of GPUs might affect the training process?
Yes, it is scalable, you can modify the number of GPUs by setting --nproc_per_node=n
, where n
is your desired number.
I have a few questions regarding the computational resources required to do so.
GPU Memory Requirement: Could you please specify the amount of GPU memory required for training the models discussed in the paper? I would appreciate knowing if there are any specific GPU models that are recommended.
Number of GPUs: How many GPUs are needed to achieve the results presented in the paper? If it's scalable, could you provide a guideline on how the number of GPUs might affect the training process?
Training Duration: What is the approximate training time for the models using the recommended setup? Is this time frame based on a specific GPU configuration?