Closed zkbaba closed 2 years ago
We are training on 8 x V100 (32GB), the training command is:
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8
Your error should be caused by insufficient GPU memory. You can try to modify the batch size by modifying data = dict(samples_per_gpu=1)
in the config file.
I met this error, how to deal with it..., thanks a lot. RuntimeError: CUDA out of memory. Tried to allocate 38.00 MiB (GPU 0; 24.00 GiB total capacity; 18.99 GiB already allocated; 18.86 MiB free; 19.31 GiB reserved in total by PyTorch)
I think the batchsize is too big, or it's the GPU setting in the code. But I didn't find set batchsize. Can this code only run under multiple GPUs?