allenai / longformer

Longformer: The Long-Document Transformer
https://arxiv.org/abs/2004.05150
Apache License 2.0
2k stars 268 forks source link

Initialization for large-model-training is far too slow #211

Open CaoYiqingT opened 2 years ago

CaoYiqingT commented 2 years ago

Training large model following cheatsheet.txt has been stuck during initialization. The computer has been showing 'initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/3' for over 1 hour.