facebookresearch / atlas

Code repository for supporting the paper "Atlas Few-shot Learning with Retrieval Augmented Language Models",(https//arxiv.org/abs/2208.03299)
Other
517 stars 67 forks source link

How do you conduct distributed training? #17

Closed Polymorphy12 closed 1 year ago

Polymorphy12 commented 1 year ago

Hello, I'm using 2 GPUs with a single node to train Atlas.

However, even if I set the local_rank to 0, the training doesn't start. It still requires MASTER_ADDR, MASTER_PORT, etc.

Is there any additional information to notice?

mlomeli1 commented 1 year ago

I am afraid I can't help if you don't add more context of how exactly are you trying to do this.