facebookresearch / ClassyVision

An end-to-end PyTorch framework for image and video classification
https://classyvision.ai
MIT License
1.59k stars 278 forks source link

Is this comment about distributed training out of date? #740

Closed cyy53589 closed 3 years ago

cyy53589 commented 3 years ago

Hi, I think your're doing great job and ClassyVision is elegant to use. I've a question and I would appreciate if you can answer it.

Here's comment in classy_train.py and I run it. https://github.com/facebookresearch/ClassyVision/blob/f6d0cbc46ffe26df505fd1f618f2ee76790619b0/classy_train.py#L23-L33

I got:

classy_train.py: error: unrecognized arguments: --device=gpu --num_workers=1

It seems that options--device and --num_worker is deprecated.

And I find a option--distributed_backend DISTRIBUTED_BACKEND. It would be enough by setting it to --distributed_backend=ddp for distributed training? Any other code / script / args should be noticed?

mannatsingh commented 3 years ago

Hi @cyy5358 ! Glad that you like Classy Vision!

You're right, we now automatically detect the device, and the data loader workers are specified directly in the dataset configuration!

For single node multi-GPU training, passing --distributed_backend=ddp should be enough to the torch distributed launcher! Please let me know if you run into any issues!

Let me fix the documentation in the mean time!