HuangLK / transpeeder

train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
Apache License 2.0
208 stars 18 forks source link

Refine dp #42

Closed JY-Ren closed 11 months ago

JY-Ren commented 11 months ago
  1. fix the dp issue (not effective when dp greater than 1)
  2. add dialog data convert script
  3. remove inv_freq from buffers