Closed liangjxiong closed 3 months ago
We did not use 4/8 GPUs. You might try the lr scaling rule for SGD. But may not generalize for Adam optimizer.
Thank you! Can you provide a script for testing inference time?
Multi-processing evaluation can bring lots of issues. We only use one GPU for evaluation.
Hello! Your work is very outstanding. If I use 4 or 8 GPUs for training, how should I set the learning rate and what should I set it to? I hope to hear back from you. Thank you!