bowenc0221 / panoptic-deeplab

This is Pytorch re-implementation of our CVPR 2020 paper "Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation" (https://arxiv.org/abs/1911.10194)
Apache License 2.0
590 stars 117 forks source link

Is it possible to train custom data with 1 GPU? #74

Closed mvdelt closed 3 years ago

mvdelt commented 3 years ago

I tried to train with Colab. I think only 1gpu is available in Colab.

I run: python train_net.py --config-file configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml

and the results:

Command Line Args: Namespace(config_file='configs/Cityscapes-PanopticSegmentation/panoptic_deeplab_R_52_os16_mg124_poly_90k_bs32_crop_512_1024_dsconv.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=1, num_machines=1, opts=[], resume=False)

...

  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size
    return _get_group_size(group)
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/usr/local/lib/python3.6/dist-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized

I checked related comments like this and this:

"SyncBatchNorm cannot run on 1 GPU by its definition."

Is there any way to train a custom dataset with only 1 GPU (or in Colab)?

bowenc0221 commented 3 years ago

As these issues suggest, you will need to disable syncbn when training on a single GPU.

mvdelt commented 3 years ago

Thanks. I adjusted it and also changed batchsize to be smaller, now it trains well :)

git-haddadz commented 3 years ago

Thanks. I adjusted it and also changed batchsize to be smaller, now it trains well :)

Hello can you post the hyperparameters and batch size you did to make it turn in google colab? I still can't find a way to run it

mvdelt commented 3 years ago

Thanks. I adjusted it and also changed batchsize to be smaller, now it trains well :)

Hello can you post the hyperparameters and batch size you did to make it turn in google colab? I still can't find a way to run it

Hi, I am currently hospitalized due to a sudden accident. If it is not too late, I will look for information after discharge. Anyways, I think the batch size was about 4. Try it and if it doesn't work, then try 2.

git-haddadz commented 3 years ago

Thanks. I adjusted it and also changed batchsize to be smaller, now it trains well :)

Hello can you post the hyperparameters and batch size you did to make it turn in google colab? I still can't find a way to run it

Hi, I am currently hospitalized due to a sudden accident. If it is not too late, I will look for information after discharge. Anyways, I think the batch size was about 4. Try it and if it doesn't work, then try 2.

Thank you, I wish you a great recovery (I still didn't find a solution to my problem)