Batch size modification

chandratejatiriveedhi commented 1 year ago

I am having an issue when I am trying to train the SETR model on cityscapes dataset using this config file SETR_PUP_768x768_40k_cityscapes_bs_8. I am trying to train this on one GPU and I get the following CUDA out of memory error. Tried to allocate 326.00 MiB (GPU 0; 11.90 GiB total capacity; 10.88 GiB already allocated; 254.94 MiB free; 11.06 GiB reserved in total by PyTorch) error saying that

I am trying to modify the batchsize to 1 instead of 8, where can I do this in the config file? Is it data = dict(samples_per_gpu=1)? What is the ideal number of GPU's to train this model on CityScapes dataset?

Also, do you have any updated version of the code to run this code on Cuda 11 and beyond.

sixiaozheng commented 1 year ago

Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr .

chandratejatiriveedhi commented 1 year ago

Hi Zheng,

When I try to run the training script, I get the error AssertionError: Default process group is not initialized. Can you tell me what are the common causes of this error? And how can I resolve this? Is this due to training it on one GPU and doing non distributed training?

Please let me know.

With Regards, Teja

On Sat, 13 May 2023 at 03:23, Sixiao Zheng @.***> wrote:

Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr .

— Reply to this email directly, view it on GitHub https://github.com/fudan-zvg/SETR/issues/58#issuecomment-1546584555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUZJIQOHOP57LAABVSPCHXLXF4ZG5ANCNFSM6AAAAAAX64NCHY . You are receiving this because you authored the thread.Message ID: @.***>

chandratejatiriveedhi commented 1 year ago

Hi Zheng, Do you have any further updates on this and common issues for assertion error occurring? With Regards,Teja On May 21, 2023, at 3:17 AM, Chandra Teja Tiriveedhi @.> wrote:Hi Zheng, When I try to run the training script, I get the error AssertionError: Default process group is not initialized. Can you tell me what are the common causes of this error? And how can I resolve this? Is this due to training it on one GPU and doing non distributed training?Please let me know. With Regards, Teja On Sat, 13 May 2023 at 03:23, Sixiao Zheng @.> wrote: Thank you for your interest in our work. You can change the batch size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L61 . If you still can't solve the problem, you can try to run other datasets, or SETR-Naive, or change the image size on https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_MLA_768x768_80k_cityscapes_bs_8.py#L7. For a fair comparison with other papers, we train on 8 GPUs with one sample per GPU. If you want to run SETR on CUDA11, it is recommended to try the implementation of SETR in mmsegmentation https://github.com/open-mmlab/mmsegmentation/tree/main/configs/setr .

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

fudan-zvg / SETR

Batch size modification #58