ShiqiYu / OpenGait

A flexible and extensible framework for gait recognition. You can focus on designing your own models and comparing with state-of-the-arts easily with the help of OpenGait.
665 stars 154 forks source link

Eliminate Deprecation by replacing `torch.distributed.launch` with `torchrun` #148

Closed DavidLee528 closed 8 months ago

DavidLee528 commented 11 months ago

Got a deprecation warning when executing train.sh and test.sh:

....../torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun.

torch.distributed.launch is deprecated with torch as it's alternative tool.

Where torchrun provides a superset of the functionality as torch.distributed.launch with the following additional functionalities:

  1. Worker failures are handled gracefully by restarting all workers.
  2. Worker RANK and WORLD_SIZE are assigned automatically.
  3. Number of nodes is allowed to change between minimum and maximum sizes (elasticity).

See more here

ChaoFan996 commented 10 months ago

Thanks for your sharing ! I would like to know whether this change is only supported for torch>2.0 now?

DavidLee528 commented 10 months ago

Thanks for your sharing ! I would like to know whether this change is only supported for torch>2.0 now?

Dear @ChaoFan996, thanks for your reply.

Using torchrun is equivalent to invoking python -m torch.distributed.run, see here.

In Pytorch version 1.13.1, I have found the torch.distributed.run in source code:

image

github-actions[bot] commented 8 months ago

Stale pull request message