bigscience-workshop Megatron-DeepSpeed issues

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.31k stars 213 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Combine Specs

#304 Muennighoff closed 2 years ago
1
PrefixLM didn't work because we didn't pass attention_mask correctly

#303 thomasw21 closed 2 years ago
1
CI fixes

#302 stas00 closed 2 years ago
0
[WIP] Hack my way to get OPT running

#301 thomasw21 opened 2 years ago
0
[MLM] Train script for non causal decoder

#300 thomasw21 opened 2 years ago
0
[MTF] Add `weighted-split-paths` support

#299 thomasw21 closed 2 years ago
0
MTF optimize dataloading

#298 thomasw21 closed 2 years ago
2
resolve conflict

#297 lintangsutawika closed 2 years ago
1
Add P3 preparation script

#296 Muennighoff closed 2 years ago
0
MTF train script

#295 thomasw21 closed 2 years ago
1
Merge MLM too fast 2

#294 thomasw21 closed 2 years ago
0
MTF dataset and packing

#293 thomasw21 closed 2 years ago
1
a branch combining layer-norm-auto-sync and ds_ckpt_reshape

#292 stas00 opened 2 years ago
0
BigScience Eval Harness

#291 Muennighoff opened 2 years ago
0
Merged MLM too fast.

#290 thomasw21 closed 2 years ago
0
No-ZeRO reshaping

#289 Muennighoff opened 2 years ago
0
Mtf finetuning

#288 lintangsutawika closed 2 years ago
3
Mlm adaptation

#287 lintangsutawika closed 2 years ago
4
WIP: Shared t5 code

#286 thomasw21 opened 2 years ago
0
Initialize dist with DS

#285 Quentin-Anthony closed 2 years ago
0
MLM adaptation and Multitask Finetuning

#284 lintangsutawika closed 2 years ago
4
Fix tflops glu computation

#283 Muennighoff closed 2 years ago
3
[valid] deadlock workaround

#282 stas00 closed 2 years ago
5
Fix mixed fused layer norm to mimick nn.LayerNorm for torch>1.11

#281 thomasw21 closed 2 years ago
2
DeepSpeedCheckpoint needs to support bf16 optimizer states.

#280 thomasw21 opened 2 years ago
0
[new feature] add a `tag` to the arguments to load the checkpoint from a specific step (not necessarly the latest)

#279 SaulLu opened 2 years ago
0
[doc] start-fast instructions

#278 stas00 closed 2 years ago
0
Thomas/olruwase/sync layer norms

#277 thomasw21 closed 2 years ago
0
Make dataloader use another random generator

#276 thomasw21 closed 2 years ago
4
[WIP] add debug utils

#275 stas00 opened 2 years ago
0
Sync 4 layer norms - bf16, fp32, optimizer states on restart

#274 tjruwase opened 2 years ago
0
`torch.testing.assert_equal` didn't make it

#273 stas00 closed 2 years ago
0
sync layer norms

#272 stas00 closed 2 years ago
1
Sync layer norm

#271 thomasw21 opened 2 years ago
0
Test different layer norm

#270 thomasw21 opened 2 years ago
0
[tensorboard] add rename and remove event tools

#269 stas00 closed 2 years ago
0
[kill switch] fix test

#268 stas00 closed 2 years ago
0
[TB] disable samples-per-dataset, steps-per-dataset, tokens-per-dataset

#267 stas00 closed 2 years ago
0
[kill switch] correct sys.exit

#266 stas00 closed 2 years ago
0
c10d crashes on `sys.exit(0)`

#265 stas00 closed 2 years ago
10
Preprocessing from arrow file to load an HF dataset

#264 TevenLeScao opened 2 years ago
1
launch debug code

#263 stas00 opened 2 years ago
0
[embed norm] switch to apex MixedFusedLayerNorm

#262 stas00 closed 2 years ago
0
allocate embed norm only on pp0

#261 stas00 closed 2 years ago
0
sync the whole Meg-LM fused_kernels sub-tree

#260 stas00 closed 2 years ago
4
Fix softmax

#259 thomasw21 closed 2 years ago
4
deploy elastic error handler

#258 stas00 closed 2 years ago
0
Fix padded vocab size on preprocessing scripts

#257 thomasw21 closed 2 years ago
0
make partition_method configurable

#256 stas00 closed 2 years ago
0
add `pad-vocab-size-to` argument and tests

#255 SaulLu closed 2 years ago
1

Previous Next