-
### Feature request
Add a MistralForQuestionAnswering class to the [modeling_mistral.py](https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py) so …
-
Add the option to load models in bfloat16 and float16. Esp important for large models like GPT-J and GPT-NeoX.
Ideally, load from HuggingFace in this low precision, do weight processing on the CPU,…
-
**Describe the bug**
When run $python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B --retrieval
it report a RuntimeError: The size of tensor a (2048) must match the size of tensor b…
-
when i use this model ask some question while a moment later happens the memory is not enough
WeTraceback (most recent call last):
File "/export/openChatKit/openChatKit/inference/bot.py", l…
-
@eric-weiss-zyphra discovered that upstream Megatron-LM is still on the old dataloader scheme (as opposed to gpt-neox), leading to overflow errors like:
```
File "torch/utils/data/_utils/collate.p…
-
I have configured this setup in a 4GPU machine. I have done the setup using docker image. I have received the Context prompt. When I feed it an example, it is falling apart. Can you please help me to…
-
### Your current environment
```text
Versions of relevant libraries:
[pip3] flashinfer==0.0.9+cu121torch2.3
[pip3] numpy==1.26.4
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] sentence-transformers==3.0…
-
### System Info
```shell
optimum: 1.8.6
transformers: 4.29.2
ubuntu/CUDA 11.7/pytorch 2.1
```
### Who can help?
@fxmarty @younesbelkada
### Information
- [ ] The official example scripts
- [X]…
-
or do we need to also integrate aspects such as Megatron-LM or migrate fully to GPT-NeoX.
-
Hi, I am able to run the 2.8B version. Here are some sample inputs/outputs I am able to get:
```
Input: the quick brown fox
jumps over the lazy dog"I'm not sure if this is the best way to do thi…