Open witnessai opened 2 weeks ago
Did you change the '--num-gpus' as 4 ? I don't have 4 GPUs for testing, and I have no idea what has happened when run with 4 GPUs by now.
Yes, I set --num-gpus 4
Did you changed the batch size? It should be at least 4.
Yes, the batch size is set to 8
Running the program on 4 GPUs, an error occurs at line 343 of train_multidatasets.py, getting stuck at the line results = evaluator.evaluate() in the inference_on_dataset function, The error message is as follows:
But running the program on 2 GPUs does not result in an error. Do you know what the reason might be? Is it related to the shell code that only uses 2 GPUs to run the program?