andreeaiana / newsreclib

PyTorch-Lightning Library for Neural News Recommendation
https://newsreclib.readthedocs.io/en/latest/
MIT License
41 stars 8 forks source link

Can I run a job with pytorch distributed training? #3

Open chiyuzhang94 opened 8 months ago

chiyuzhang94 commented 8 months ago

Can I run a job with pytorch distributed training? If I run this commend, does it work? torchrun --nproc_per_node=$WORLD_SIZE --master_port=1234 newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

andreeaiana commented 8 months ago

You can run a job with PyTorch distributed training by changing the accelerator, strategy and devices number of the trainer. For example, you can use the ddp_config.

Alternatively, you can do this from command line as python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent trainer.accelerator=gpu trainer.strategy=ddp trainer.devices=4