EleutherAI gpt-neox issues

EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

https://www.eleuther.ai/

Apache License 2.0

6.95k stars 1.02k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Error when converting sequential model to HF

#1323 SilverSulfide opened 2 days ago
0
Runtime per step linearly increases with training step number.

#1322 iPRET opened 1 week ago
1
Can `preprocess_data.py` support Huggingface Dataset?

#1321 cafeii opened 1 week ago
1
_forward_step_fn does not always return two values so eval.py breaks if is_pipe_parallel is false

#1320 markNZed opened 1 week ago
2
LLama mlp project layers missmatch with HF config during conversion

#1319 Vmjkom closed 1 week ago
2
Fix documentation for converting SFT/DPO weights back to HF Llama

#1318 jacobthebanana closed 1 week ago
0
KeyError when converting DPO weights from GPTNeoX format to HuggingFace Llama in post-training documentations

#1317 jacobthebanana closed 1 week ago
0
Update text_generation_utils.py to work with pipe_parallel_size of 0

#1316 markNZed opened 3 weeks ago
0
fix a GQA issue (#1314)

#1315 tiandeyu-cs closed 1 week ago
0
Training crashes when "(hidden_size * num_kv_heads) / (num_attention_heads * num_attention_heads)" is not an integer.

#1314 tiandeyu-cs closed 1 week ago
0
Python 3.10 support

#1313 markNZed closed 1 week ago
1
Add support for dropout in sparse attention

#1312 michaelc-yu closed 6 days ago
0
Add default bf16 precision setting when bf16 config option is set but precision is unset.

#1311 AI-WAIFU closed 1 week ago
0
[Question] Running gpt-neox on AMD-based LUMI HPC centre.

#1310 iPRET closed 3 weeks ago
1
fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed

#1309 tiandeyu-cs closed 1 week ago
1
Add ERROR logging prefix and sort the prefixes alphabetically

#1308 TheBatmanofButler closed 1 month ago
2
DeeperSpeed cannot support BFloat16 and PipelineParallelism

#1307 jahatef opened 1 month ago
1
Latest DeepSpeed not supported

#1306 jahatef opened 1 month ago
0
Error with rotary embeddings and BFloat16

#1305 jahatef closed 2 days ago
1
CUDA/Pytorch multiprocessing workaround and test fixes

#1304 AI-WAIFU opened 1 month ago
0
pytest-forked alternative to get around CUDA/pytorch multiprocessing limitation

#1303 AI-WAIFU opened 1 month ago
0
adds pyproject files and tests

#1302 LouisCastricato closed 6 days ago
0
Fix failling tests

#1301 AI-WAIFU closed 1 month ago
0
Add additional asserts and update post training readme

#1300 AI-WAIFU closed 1 month ago
0
Add support for context parallelism

#1299 bclyang opened 1 month ago
1
Improve Profiling Docs

#1298 Quentin-Anthony closed 1 month ago
0
TE integration via full TransformerLayer

#1297 tf-nv opened 1 month ago
0
hotfix for tp >= 2 and pp > 2 in autoitercount

#1296 AI-WAIFU closed 1 month ago
0
readded RM training removed during merge conflict in KTO

#1295 dmahan93 closed 1 month ago
0
Add KTO Post-training example

#1294 dmahan93 closed 1 month ago
0
update args docs

#1293 Quentin-Anthony closed 1 month ago
0
update neox arg docs

#1292 Quentin-Anthony closed 1 month ago
1
mamba flop calculations

#1291 jahatef closed 1 month ago
0
Fix dataset bug

#1290 Quentin-Anthony closed 2 months ago
0
Reinforce PR

#1288 dmahan93 opened 2 months ago
1
Remove the remaining two hanging wandb config fields

#1287 Quentin-Anthony closed 2 months ago
0
Make monitors consistent

#1286 Quentin-Anthony closed 2 months ago
0
Fix off by 1 error on masked tokens for RM training

#1285 dmahan93 closed 2 months ago
0
Update Comet integration instructions

#1284 Lothiraldan closed 2 months ago
0
Automatically compute train_iters when train_epochs is specified.

#1283 AI-WAIFU closed 1 month ago
1
TransformerEngine Integration

#1282 aurelion-source opened 2 months ago
3
Add model parallel group to reduce scatter

#1281 bclyang closed 2 months ago
0
Do not fail when git is not installed

#1280 gcaillaut closed 1 month ago
1
fix the imports needed for comet integration

#1279 Quentin-Anthony closed 2 months ago
0
fix gpt-j residual bias assumption

#1278 dmahan93 closed 2 months ago
0
Post training examples

#1277 dmahan93 closed 2 months ago
3
Hotfix llama models

#1276 dmahan93 closed 2 months ago
1
Add more informative checks for ZeRO incompatibility.

#1275 AI-WAIFU closed 2 months ago
0
Fix weight decay module check

#1274 aurelion-source closed 2 months ago
0
Expand Docstring

#1273 AI-WAIFU closed 2 months ago
0