-
**Hello! I use auto-gptq to quantized `llama-2-7b-instruct` model to `llama-2-7b-instruct-4bit-128g`. And i try to compare the speed between them. But the result is very strange. The storage of the qu…
-
I am trying to use Trt_llm rag with Mistral 7B model.
I have used int8 weight-only quantization during the building of the TRT engine.
The app launches but drops an error when an input is passed to …
-
run on Mac M3 Max 128GB
run this code
```
from transformers import AutoModel, AutoTokenizer
MAX_LENGTH = 128
model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4b…
-
我的训练集数据量很大,有上百万,直接读取训练会OOM,所以使用streaming模式读取数据,但是发现训练速度很慢。
发现gpu的利用率很低
cpu直接被打满了
训练参数
```
SftArguments(train_type='sft', model_type='internvl2-8b', model_revision='master', full_deter…
-
## 🐛 Bug
## To Reproduce
Steps to reproduce the behavior:
I followed [https://captum.ai/tutorials/Llama2_LLM_Attribution](url)
My code is here,the only difference is I changed the model_…
-
### Describe the bug
Unable to load 8B llava model:
https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers
### Is there an existing issue for this?
- [X] I have searched the existing iss…
-
问题1:
我使用PaddleSeg-release-2.8.1 的方式进行训练自己的数据集,然进行感知量化训练 在PaddleSeg-release-2.8.1/deploy/slim/quant文件下进行了训练 动态转静态 。然后使用 paddle2onnx的静态文件转onnx文件 没有生成 onnx为文件
(PaddleSeg) D:\PY\PaddleSeg-rele…
-
I refined llama3.1 8b bnb 4bits according to your recommendations with my own train+eval dataset and saved as merged 16 bits. I now want to create an inference by loading the 16b merged model and usin…
-
File "web_demo.py", line 129, in
main(args)
File "web_demo.py", line 83, in main
model, tokenizer = get_infer_setting(gpu_device=0, quant=args.quant)
File "/opt/model/infer_util.py",…
-
I'm trying to apply dolphin mistral's prompt template format:
system
{system_prompt}
user
{user_prompt}
assistant
I've tried this a couple of different ways:
quant_path = "TheBloke/dolphi…