issues
search
EleutherAI
/
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.95k
stars
1.02k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Error when converting sequential model to HF
#1323
SilverSulfide
opened
2 days ago
0
Runtime per step linearly increases with training step number.
#1322
iPRET
opened
1 week ago
1
Can `preprocess_data.py` support Huggingface Dataset?
#1321
cafeii
opened
1 week ago
1
_forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false
#1320
markNZed
opened
1 week ago
2
LLama mlp project layers missmatch with HF config during conversion
#1319
Vmjkom
closed
1 week ago
2
Fix documentation for converting SFT/DPO weights back to HF Llama
#1318
jacobthebanana
closed
1 week ago
0
KeyError when converting DPO weights from GPTNeoX format to HuggingFace Llama in post-training documentations
#1317
jacobthebanana
closed
1 week ago
0
Update text_generation_utils.py to work with pipe_parallel_size of 0
#1316
markNZed
opened
3 weeks ago
0
fix a GQA issue (#1314)
#1315
tiandeyu-cs
closed
1 week ago
0
Training crashes when "(hidden_size * num_kv_heads) / (num_attention_heads * num_attention_heads)" is not an integer.
#1314
tiandeyu-cs
closed
1 week ago
0
Python 3.10 support
#1313
markNZed
closed
1 week ago
1
Add support for dropout in sparse attention
#1312
michaelc-yu
closed
6 days ago
0
Add default bf16 precision setting when bf16 config option is set but precision is unset.
#1311
AI-WAIFU
closed
1 week ago
0
[Question] Running gpt-neox on AMD-based LUMI HPC centre.
#1310
iPRET
closed
3 weeks ago
1
fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed
#1309
tiandeyu-cs
closed
1 week ago
1
Add ERROR logging prefix and sort the prefixes alphabetically
#1308
TheBatmanofButler
closed
1 month ago
2
DeeperSpeed cannot support BFloat16 and PipelineParallelism
#1307
jahatef
opened
1 month ago
1
Latest DeepSpeed not supported
#1306
jahatef
opened
1 month ago
0
Error with rotary embeddings and BFloat16
#1305
jahatef
closed
2 days ago
1
CUDA/Pytorch multiprocessing workaround and test fixes
#1304
AI-WAIFU
opened
1 month ago
0
pytest-forked alternative to get around CUDA/pytorch multiprocessing limitation
#1303
AI-WAIFU
opened
1 month ago
0
adds pyproject files and tests
#1302
LouisCastricato
closed
6 days ago
0
Fix failling tests
#1301
AI-WAIFU
closed
1 month ago
0
Add additional asserts and update post training readme
#1300
AI-WAIFU
closed
1 month ago
0
Add support for context parallelism
#1299
bclyang
opened
1 month ago
1
Improve Profiling Docs
#1298
Quentin-Anthony
closed
1 month ago
0
TE integration via full TransformerLayer
#1297
tf-nv
opened
1 month ago
0
hotfix for tp >= 2 and pp > 2 in autoitercount
#1296
AI-WAIFU
closed
1 month ago
0
readded RM training removed during merge conflict in KTO
#1295
dmahan93
closed
1 month ago
0
Add KTO Post-training example
#1294
dmahan93
closed
1 month ago
0
update args docs
#1293
Quentin-Anthony
closed
1 month ago
0
update neox arg docs
#1292
Quentin-Anthony
closed
1 month ago
1
mamba flop calculations
#1291
jahatef
closed
1 month ago
0
Fix dataset bug
#1290
Quentin-Anthony
closed
2 months ago
0
Reinforce PR
#1288
dmahan93
opened
2 months ago
1
Remove the remaining two hanging wandb config fields
#1287
Quentin-Anthony
closed
2 months ago
0
Make monitors consistent
#1286
Quentin-Anthony
closed
2 months ago
0
Fix off by 1 error on masked tokens for RM training
#1285
dmahan93
closed
2 months ago
0
Update Comet integration instructions
#1284
Lothiraldan
closed
2 months ago
0
Automatically compute train_iters when train_epochs is specified.
#1283
AI-WAIFU
closed
1 month ago
1
TransformerEngine Integration
#1282
aurelion-source
opened
2 months ago
3
Add model parallel group to reduce scatter
#1281
bclyang
closed
2 months ago
0
Do not fail when git is not installed
#1280
gcaillaut
closed
1 month ago
1
fix the imports needed for comet integration
#1279
Quentin-Anthony
closed
2 months ago
0
fix gpt-j residual bias assumption
#1278
dmahan93
closed
2 months ago
0
Post training examples
#1277
dmahan93
closed
2 months ago
3
Hotfix llama models
#1276
dmahan93
closed
2 months ago
1
Add more informative checks for ZeRO incompatibility.
#1275
AI-WAIFU
closed
2 months ago
0
Fix weight decay module check
#1274
aurelion-source
closed
2 months ago
0
Expand Docstring
#1273
AI-WAIFU
closed
2 months ago
0
Next