dropreg / R-Drop

867 stars 107 forks source link

How to use the data parallel in r-drop. #32

Closed xinxinxing closed 1 year ago

xinxinxing commented 1 year ago

how to use the data parallel in r-drop. As I use the distributed-world-size ,the bleu is lower than the one gpu. Here is my train.sh image

apeterswu commented 1 year ago

Hi @xinxinxing , the default parallel mode in fairseq is the data parallel. So if you are running on a single node with 4 GPU cards, you don't need to specify the distributed-word-size, just remove this arg and run in a normal way.