Training on a single GPU

LeapLabTHU / Agent-Attention

Official repository of Agent Attention (ECCV2024)

473 stars 35 forks source link

Training on a single GPU #2

Closed YEKAI-2022 closed 8 months ago

YEKAI-2022 commented 8 months ago

你好，我在本机debug代码时，出现并行的问题local_rank和world_size,是参数设置有错误吗？

Snipaste_2023-12-20_16-07-49

YEKAI-2022 commented 8 months ago

我本机是单机单卡的，像这种debug出现并行问题该怎么解决，

tian-qing001 commented 8 months ago

Hello @YEKAI-2022. For training our module on a single GPU, the recommended command is:

python -m torch.distributed.launch --nproc_per_node=1 main.py --cfg <path-to-config-file> --data-path <imagenet-path> --output <output-path>

Upon reviewing the screenshot you shared, it appears that the command you are currently using is functionally equivalent to:

python main.py --cfg <path-to-config-file> --data-path <imagenet-path> --output <output-path>

tian-qing001 commented 8 months ago

If your objective is solely to debug the model, consider creating a new script where you can load the model without executing the main.py.

YEKAI-2022 commented 8 months ago

torch.distributed.launch --nproc_per_node=1如果debug的话，上述的参数也要和--cfg --data-path --output 加到参数列表吗

tian-qing001 commented 8 months ago

@YEKAI-2022 You can refer to sections 3.2 and 3.3 of this blog. It's important to note that these settings are not exclusive to our project and can be found online.

YEKAI-2022 commented 8 months ago

Thank you very much for your answer