-
I am interested in using the Mamba2 model with the `transformers` library. However, I've encountered several issues and have some questions:
1. **Model Accessibility:** It seems the Mamba2 model is…
-
Fuse some popular functions and automatically replace modules in an existing 🤗 transformers model with their corresponding fusion module
**APIs**
```
from pipegoose.nn import fusion
# and ot…
-
``` /usr/local/cuda-11.1/bin/nvcc -I/home/hugh/anaconda3/envs/gptserv/lib/python3.9/site-packages/torch/include -I/home/hugh/anaconda3/envs/gptserv/lib/python3.9/site-packages/torch/include/torch/c…
-
Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures.
Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded r…
-
Hello,
It seems that currently int8 weight only and SmoothQuant quantizations are supported for GPT models, but no kind of quantization is supported for other autoregressive transformer models, suc…
-
When I tried
```
!python qlora.py –learning_rate 0.0001 --model_name_or_path EleutherAI/gpt-neox-20b --trust_remote_code
```
in colab, i got following errors
```
2023-06-03 13:54:17.113623: W t…
-
### Description
```shell
Model: Gpt-NeoX
GPU: A100
Tritonserver version: 22.12
```
Hello, I'm not sure whether this is FasterTransformer's issue or backend's issue, but still I'm reporting i…
-
File "/UNICOMFS/hitsz_mzhang_1/.conda/envs/quantize/lib/python3.9/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 155, in forward
qkv = self.query_key_value(hidden_states…
-
Hello, I see your batch_view.py, found that the data does not use a shuffle, but in the gpt-neox library, the data is shuffled.
So I want to make sure that the author did or did not shuffle during t…
-
SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency `triton==0.4.2`, which is behind the DeepSpeed version of `1.0.0`. It is far behind the version of Triton …