KMnP / vpt

❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119
Other
1.04k stars 91 forks source link

How to use multiple GPUs? #5

Closed qianlanwyd closed 2 years ago

qianlanwyd commented 2 years ago

If I set NUM_GPUS =2, there are following mistakes. Could you please tell me how to use multiple GPUs?

Traceback (most recent call last):
  File "train.py", line 132, in <module>
    main(args)
  File "train.py", line 127, in main
    train(cfg, args)
  File "train.py", line 102, in train
    train_loader, val_loader, test_loader = get_loaders(cfg, logger)
  File "train.py", line 69, in get_loaders
    train_loader = data_loader.construct_trainval_loader(cfg)
  File "/home/haoc/wangyidong/vpt/src/data/loader.py", line 79, in construct_trainval_loader
    drop_last=drop_last,
  File "/home/haoc/wangyidong/vpt/src/data/loader.py", line 39, in _construct_loader
    sampler = DistributedSampler(dataset) if cfg.NUM_GPUS > 1 else None
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/utils/data/distributed.py", line 65, in __init__
    num_replicas = dist.get_world_size()
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 638, in get_world_size
    return _get_group_size(group)
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/home/haoc/miniconda3/envs/prompt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 211, in _check_default_pg
    "Default process group is not initialized"
AssertionError: Default process group is not initialized
KMnP commented 2 years ago

Hi, at this time we don’t support multi-gpu training. I found that single GPU is sufficient for the experiment in the project.

qianlanwyd commented 2 years ago

What gpu did you use?

qianlanwyd commented 2 years ago

In fact, I can’t run experiments with single 2080ti. What is the minimum gpu memory for all experiments?

KMnP commented 2 years ago

We use A100-40GB GPUs for all experiments. I also recommend reading our paper, especially Appendix for more implementation details if you have further questions. I recommend reduce batchsize if run into cuda-out-of-memory issue, or implement distributed training.