bigscience-workshop Megatron-DeepSpeed issues

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.31k stars 213 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Why pretrain_llama_distributed.sh use pretrain_gpt.py ?

#404 BrucePeng92 closed 3 weeks ago
0
How can I set recomputation-granularity,like selective or full?

#403 LordEdison opened 4 months ago
0
Bump black from 21.4b0 to 24.3.0

#402 dependabot[bot] opened 5 months ago
0
Hello, what version of the megatron-lm library is your code modified?

#401 4thGardenOfQMH opened 6 months ago
0
Is this assertion for mask wrong?

#400 yinfangchen opened 7 months ago
1
Feature/tigerbot

#399 i4never closed 10 months ago
0
Hello, can Megatron-DeepSpeed pre-train llama2?

#398 13416157913 opened 11 months ago
0
Cannot run 3D parallelism with tp == 1 dp == 3 pp == 2 degrees

#397 Heelim-Hong closed 1 year ago
0
the traing log like this is Normal？ I do not find loss in the logs, and what does the "grad norm: nan" mean?

#396 alphanlp opened 1 year ago
0
The difference between zero-3 and megatron with zero-2

#395 nicosouth opened 1 year ago
0
Question about the implementation of mpu.cross_entropy when using tensor parallel

#394 robin087 opened 1 year ago
0
Feature/tigerbot

#393 i4never closed 1 year ago
0
questions about inconsistent evaluation result

#392 coorful opened 1 year ago
0
stage3 error: IndexError: list index out of range

#391 PhdShi closed 1 year ago
1
ModuleNotFoundError: No module named 'packaging' when install apex

#390 SeekPoint closed 1 year ago
3
ModuleNotFoundError: No module named 'torch' when run 'pip install -e .', but pytorch exists

#389 SeekPoint closed 1 year ago
2
Question about ds to universal

#388 saxh opened 1 year ago
0
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'

#387 zll0000 opened 1 year ago
1
hello， I meet a problem

#386 etoilestar opened 1 year ago
8
How to properly use Flops Profiler with pipelined parallelism?

#385 flyingdown opened 1 year ago
0
Fix/dataloader error

#384 EastInsure closed 1 year ago
0
pip install -e . failed with ModuleNotFoundError: No module named 'torch'

#383 SeekPoint opened 1 year ago
2
Help me, I'm dying soon，error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error

#382 listwebit opened 1 year ago
0
Megatron-DeepSpeed only applies to specific models?

#381 Bob-cby opened 1 year ago
0
Universal checkpoints and MP states

#380 aitorormazabal closed 1 year ago
2
The given group does not exist pytorch

#379 germanjke opened 1 year ago
2
upgrade megatron-lm

#378 dz1iang opened 1 year ago
0
How can we access to the gradients while the model is training?

#377 BilgehanSel opened 1 year ago
0
how to do prompt learning with bloom?

#376 moseshu opened 1 year ago
0
how to frozen some layers of GPT, only fintune last k layers?

#375 joan126 opened 1 year ago
0
How to convert model weights(e.g., bigscience/bloomz-560m-optimizer-states) to Hugging Face model.bin file?

#374 qazwsx042 closed 1 year ago
1
Can I use python only apex for gpt_pretrain?

#373 Luoyang144 opened 1 year ago
0
how to pretrain t5-lm adapted?

#372 nanyyyyyy opened 1 year ago
0
How to preprocess data for t5 model?

#371 xiu-ze opened 1 year ago
0
Add xPos embeddings

#370 janEbert opened 1 year ago
0
Exception: cuda rng state model-parallel-rng is not added

#369 520jefferson opened 1 year ago
1
适配DCU

#368 hepj987 closed 1 year ago
0
Fix various small problems

#367 janEbert opened 1 year ago
0
How to continue pre-training Bloom?

#366 ShinoharaHare opened 1 year ago
2
Bloom model training with AML

#365 savitamittal1 opened 1 year ago
0
Are there any other layer norm functions, such as RMSNorm or DeepNorm

#364 lvcc2018 opened 1 year ago
0
Is there any script for pretraining/funting Bloom?

#363 drxmy opened 1 year ago
0
Bsevalharness

#362 Muennighoff closed 1 year ago
0
Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?

#361 drxmy closed 1 year ago
0
Fatal error: cuda_fp16.h: No such file or directory on ROCm

#360 lvcc2018 opened 1 year ago
1
fintuning bloom 176b with bitfit

#359 drxmy closed 1 year ago
2
Add UL2 data sampling and pretraining

#358 janEbert opened 1 year ago
3
Add FlashAttention

#357 NouamaneTazi opened 1 year ago
3
User Warnings for accessing grad attribute of non-leaf Tensors thrown with TP=1 and PP>1

#356 chelseajohn opened 1 year ago
3
deepspeed_to_megatron several issues

#355 MatejUlcar opened 1 year ago
4