fudan-zvg / SETR

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
MIT License
1.05k stars 149 forks source link

About selection of gpu #35

Open EricStarer opened 3 years ago

EricStarer commented 3 years ago

how to select gpu when training with multiple gpus, thanks a lot

sixiaozheng commented 3 years ago

You can do this by setting ${GPU_NUM} and the environment variable CUDA_VISIBLE_DEVICES CUDA_VISIBLE_DEVICES=${GPU id list} ./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}

For example, train a SETR-PUP on Cityscapes dataset with 4 GPUs CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 4

EricStarer commented 3 years ago

thanks a lot but I met this error, how to deal with it...

Traceback (most recent call last): File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launch.py", line 173, in main() File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launch.py", line 169, in main run(args) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/run.py", line 624, in run )(cmd_args) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper return f(args, **kwargs) File "/home/jing_liang/anaconda3/envs/zhaoxing/lib/python3.6/site-packages/torch/distributed/launcher/api.py", line 247, in launch_agent failures=result.failures, torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

sixiaozheng commented 2 years ago

It may be that your environment is installed incorrectly. It is recommended to check the version of the package or reinstall the environment according to the A from-scratch setup script in the README.