Closed crangelsmith closed 8 months ago
Quick question, there isn't an option to choose if the user want to use fabric or not ? is this important ? from what I can see in the test that I have done, on a single GPU the performance doesn't change significantly , is this why there isn't an option ?
Quick question, there isn't an option to choose if the user want to use fabric or not ? is this important ? from what I can see in the test that I have done, on a single GPU the performance doesn't change significantly , is this why there isn't an option ?
If you are running on a single GPU is basically like not using fabric, as its code is implementing the default pytorch device handling under the hood. This would change if you try to use multiple GPUSs or different strategies (ddp, fsdp).
Have you tried to running 2 GPUs with ddp?, in my benchmarking this halfs the time of running (if Baskerville allows it..).
278
This has been tested thoroughly on 1 GPU in Baskerville, and locally (where default is CPU). There are some issues when trying to use multiple GPUs in one job, where sometimes this issue is encountered. It is not clear yet why and we are investigating (suspect to be a Baskerville slurm environment-related), but as currently, 1 GPU is enough for the calculations it should not affect our progress.
For review:
tools/slurm_run.sh
script and sent a job in baskerville for a preferred dataset.