issues
search
bigscience-workshop
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k
stars
213
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Why pretrain_llama_distributed.sh use pretrain_gpt.py ?
#404
BrucePeng92
closed
3 weeks ago
0
How can I set recomputation-granularity,like selective or full?
#403
LordEdison
opened
4 months ago
0
Bump black from 21.4b0 to 24.3.0
#402
dependabot[bot]
opened
5 months ago
0
Hello, what version of the megatron-lm library is your code modified?
#401
4thGardenOfQMH
opened
6 months ago
0
Is this assertion for mask wrong?
#400
yinfangchen
opened
7 months ago
1
Feature/tigerbot
#399
i4never
closed
10 months ago
0
Hello, can Megatron-DeepSpeed pre-train llama2?
#398
13416157913
opened
11 months ago
0
Cannot run 3D parallelism with tp == 1 dp == 3 pp == 2 degrees
#397
Heelim-Hong
closed
1 year ago
0
the traing log like this is Normal? I do not find loss in the logs, and what does the "grad norm: nan" mean?
#396
alphanlp
opened
1 year ago
0
The difference between zero-3 and megatron with zero-2
#395
nicosouth
opened
1 year ago
0
Question about the implementation of mpu.cross_entropy when using tensor parallel
#394
robin087
opened
1 year ago
0
Feature/tigerbot
#393
i4never
closed
1 year ago
0
questions about inconsistent evaluation result
#392
coorful
opened
1 year ago
0
stage3 error: IndexError: list index out of range
#391
PhdShi
closed
1 year ago
1
ModuleNotFoundError: No module named 'packaging' when install apex
#390
SeekPoint
closed
1 year ago
3
ModuleNotFoundError: No module named 'torch' when run 'pip install -e .', but pytorch exists
#389
SeekPoint
closed
1 year ago
2
Question about ds to universal
#388
saxh
opened
1 year ago
0
RuntimeError: Error building extension 'scaled_upper_triang_masked_softmax_cuda'
#387
zll0000
opened
1 year ago
1
hello, I meet a problem
#386
etoilestar
opened
1 year ago
8
How to properly use Flops Profiler with pipelined parallelism?
#385
flyingdown
opened
1 year ago
0
Fix/dataloader error
#384
EastInsure
closed
1 year ago
0
pip install -e . failed with ModuleNotFoundError: No module named 'torch'
#383
SeekPoint
opened
1 year ago
2
Help me, I'm dying soon,error: command '/opt/rh/devtoolset-7/root/usr/bin/gcc' failed with exit code 1 error: subprocess-exited-with-error
#382
listwebit
opened
1 year ago
0
Megatron-DeepSpeed only applies to specific models?
#381
Bob-cby
opened
1 year ago
0
Universal checkpoints and MP states
#380
aitorormazabal
closed
1 year ago
2
The given group does not exist pytorch
#379
germanjke
opened
1 year ago
2
upgrade megatron-lm
#378
dz1iang
opened
1 year ago
0
How can we access to the gradients while the model is training?
#377
BilgehanSel
opened
1 year ago
0
how to do prompt learning with bloom?
#376
moseshu
opened
1 year ago
0
how to frozen some layers of GPT, only fintune last k layers?
#375
joan126
opened
1 year ago
0
How to convert model weights(e.g., bigscience/bloomz-560m-optimizer-states) to Hugging Face model.bin file?
#374
qazwsx042
closed
1 year ago
1
Can I use python only apex for gpt_pretrain?
#373
Luoyang144
opened
1 year ago
0
how to pretrain t5-lm adapted?
#372
nanyyyyyy
opened
1 year ago
0
How to preprocess data for t5 model?
#371
xiu-ze
opened
1 year ago
0
Add xPos embeddings
#370
janEbert
opened
1 year ago
0
Exception: cuda rng state model-parallel-rng is not added
#369
520jefferson
opened
1 year ago
1
适配DCU
#368
hepj987
closed
1 year ago
0
Fix various small problems
#367
janEbert
opened
1 year ago
0
How to continue pre-training Bloom?
#366
ShinoharaHare
opened
1 year ago
2
Bloom model training with AML
#365
savitamittal1
opened
1 year ago
0
Are there any other layer norm functions, such as RMSNorm or DeepNorm
#364
lvcc2018
opened
1 year ago
0
Is there any script for pretraining/funting Bloom?
#363
drxmy
opened
1 year ago
0
Bsevalharness
#362
Muennighoff
closed
1 year ago
0
Does bigscienece's Megatron-DeepSpeed support ZeRO-stage2+cpu offload?
#361
drxmy
closed
1 year ago
0
Fatal error: cuda_fp16.h: No such file or directory on ROCm
#360
lvcc2018
opened
1 year ago
1
fintuning bloom 176b with bitfit
#359
drxmy
closed
1 year ago
2
Add UL2 data sampling and pretraining
#358
janEbert
opened
1 year ago
3
Add FlashAttention
#357
NouamaneTazi
opened
1 year ago
3
User Warnings for accessing grad attribute of non-leaf Tensors thrown with TP=1 and PP>1
#356
chelseajohn
opened
1 year ago
3
deepspeed_to_megatron several issues
#355
MatejUlcar
opened
1 year ago
4
Next