issues
search
bigscience-workshop
/
Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.31k
stars
213
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Combine Specs
#304
Muennighoff
closed
2 years ago
1
PrefixLM didn't work because we didn't pass attention_mask correctly
#303
thomasw21
closed
2 years ago
1
CI fixes
#302
stas00
closed
2 years ago
0
[WIP] Hack my way to get OPT running
#301
thomasw21
opened
2 years ago
0
[MLM] Train script for non causal decoder
#300
thomasw21
opened
2 years ago
0
[MTF] Add `weighted-split-paths` support
#299
thomasw21
closed
2 years ago
0
MTF optimize dataloading
#298
thomasw21
closed
2 years ago
2
resolve conflict
#297
lintangsutawika
closed
2 years ago
1
Add P3 preparation script
#296
Muennighoff
closed
2 years ago
0
MTF train script
#295
thomasw21
closed
2 years ago
1
Merge MLM too fast 2
#294
thomasw21
closed
2 years ago
0
MTF dataset and packing
#293
thomasw21
closed
2 years ago
1
a branch combining layer-norm-auto-sync and ds_ckpt_reshape
#292
stas00
opened
2 years ago
0
BigScience Eval Harness
#291
Muennighoff
opened
2 years ago
0
Merged MLM too fast.
#290
thomasw21
closed
2 years ago
0
No-ZeRO reshaping
#289
Muennighoff
opened
2 years ago
0
Mtf finetuning
#288
lintangsutawika
closed
2 years ago
3
Mlm adaptation
#287
lintangsutawika
closed
2 years ago
4
WIP: Shared t5 code
#286
thomasw21
opened
2 years ago
0
Initialize dist with DS
#285
Quentin-Anthony
closed
2 years ago
0
MLM adaptation and Multitask Finetuning
#284
lintangsutawika
closed
2 years ago
4
Fix tflops glu computation
#283
Muennighoff
closed
2 years ago
3
[valid] deadlock workaround
#282
stas00
closed
2 years ago
5
Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11
#281
thomasw21
closed
2 years ago
2
DeepSpeedCheckpoint needs to support bf16 optimizer states.
#280
thomasw21
opened
2 years ago
0
[new feature] add a `tag` to the arguments to load the checkpoint from a specific step (not necessarly the latest)
#279
SaulLu
opened
2 years ago
0
[doc] start-fast instructions
#278
stas00
closed
2 years ago
0
Thomas/olruwase/sync layer norms
#277
thomasw21
closed
2 years ago
0
Make dataloader use another random generator
#276
thomasw21
closed
2 years ago
4
[WIP] add debug utils
#275
stas00
opened
2 years ago
0
Sync 4 layer norms - bf16, fp32, optimizer states on restart
#274
tjruwase
opened
2 years ago
0
`torch.testing.assert_equal` didn't make it
#273
stas00
closed
2 years ago
0
sync layer norms
#272
stas00
closed
2 years ago
1
Sync layer norm
#271
thomasw21
opened
2 years ago
0
Test different layer norm
#270
thomasw21
opened
2 years ago
0
[tensorboard] add rename and remove event tools
#269
stas00
closed
2 years ago
0
[kill switch] fix test
#268
stas00
closed
2 years ago
0
[TB] disable samples-per-dataset, steps-per-dataset, tokens-per-dataset
#267
stas00
closed
2 years ago
0
[kill switch] correct sys.exit
#266
stas00
closed
2 years ago
0
c10d crashes on `sys.exit(0)`
#265
stas00
closed
2 years ago
10
Preprocessing from arrow file to load an HF dataset
#264
TevenLeScao
opened
2 years ago
1
launch debug code
#263
stas00
opened
2 years ago
0
[embed norm] switch to apex MixedFusedLayerNorm
#262
stas00
closed
2 years ago
0
allocate embed norm only on pp0
#261
stas00
closed
2 years ago
0
sync the whole Meg-LM fused_kernels sub-tree
#260
stas00
closed
2 years ago
4
Fix softmax
#259
thomasw21
closed
2 years ago
4
deploy elastic error handler
#258
stas00
closed
2 years ago
0
Fix padded vocab size on preprocessing scripts
#257
thomasw21
closed
2 years ago
0
make partition_method configurable
#256
stas00
closed
2 years ago
0
add `pad-vocab-size-to` argument and tests
#255
SaulLu
closed
2 years ago
1
Previous
Next