-
Stating why one should choose this framework instead of others (GPTneoX DS\megatron, accelerate+HF etc.) may ease the work when choosing to use this framework rather than another. (The timing vs ease…
borgr updated
2 weeks ago
-
I have implemented adapter for GPTNeoX following the instructions in the documentation. It passed all tests but during the training of the language adapter, it trained the prediction head too. Do you …
-
Hello!
I am using GPTNeox for decoding, and I need to pass in some parameters during forward propagation, such as length penalty and so on.
I also use Transformers to call model.generate() for gener…
-
Pythia is a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. The model was developed with intention to facilitate research in ma…
-
### Branch/Tag/Commit
main
### Docker Image Version
/docker/Dockerfile.torch
### GPU name
A100
### CUDA Driver
470.129.06
### Reproduced Steps
```shell
./bin/gptneox_exampl…
-
Following up the discussion from #1110
`generate()` with GPTNeoXCausalLM with pythia-70m checkpoints is not working
-
### Model description
Huggingface has GPTNeoX model by ElutherAI. It's a 20 billion parameter autoregressive language model trained on the Pile, whose weights will be made freely and openly available…
-
In the example for adding to gptneox_mem_req I see that n_layers comes from the num_hidden_layers in the config.json file, but where does the 512, 512, and 1024 come from? Maybe a comment in the docu…
-
During recent experiments, if `n_threads` is just set as `None`(the default value), initiation process of `Llama` and `Gptneox` may be stuck on some platforms.
Currently, it's recommended to run:
…
-
Thanks for the excellent work. Following the comment in #59, I am trying to train `dmoe_760m` using 16 GPUs (2 nodes) by changing distributed arguments to set up for two nodes but it is very slow in t…