auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

AutoGPTQ/AutoGPTQ #722

The error message that appears when I set use_marlin=True

**Hello! I use auto-gptq to quantized `llama-2-7b-instruct` model to `llama-2-7b-instruct-4bit-128g`. And i try to compare the speed between them. But the result is very strange. The storage of the qu…

bulaikexiansheng updated 2 months ago
1
NVIDIA/ChatRTX #56

Dtype value read by app.py empty. Gets error message : Unsup…

I am trying to use Trt_llm rag with Mistral 7B model. I have used int8 weight-only quantization during the building of the TRT engine. The app launches but drops an error when an input is passed to …

RoslinAdama updated 5 months ago
2
lyogavin/airllm #158

can not run llama 3.1 405B

run on Mac M3 Max 128GB run this code ``` from transformers import AutoModel, AutoTokenizer MAX_LENGTH = 128 model = AutoModel.from_pretrained("unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4b…

taozhiyuai updated 3 months ago
2
modelscope/ms-swift #1939

streaming模式读取数据，显存利用率很低

我的训练集数据量很大，有上百万，直接读取训练会OOM，所以使用streaming模式读取数据，但是发现训练速度很慢。发现gpu的利用率很低 cpu直接被打满了训练参数 ``` SftArguments(train_type='sft', model_type='internvl2-8b', model_revision='master', full_deter…

guozhiyao updated 2 months ago
7
pytorch/captum #1366

Use meta-llama/Llama-3.2-3B-Instruct,get unexpected result.

## 🐛 Bug ## To Reproduce Steps to reproduce the behavior: I followed [https://captum.ai/tutorials/Llama2_LLM_Attribution](url) My code is here，the only difference is I changed the model_…

Ningshiqi updated 1 month ago
1
oobabooga/text-generation-webui #6065

can't load 8B llava model

### Describe the bug Unable to load 8B llava model: https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers ### Is there an existing issue for this? - [X] I have searched the existing iss…

end-me-please updated 5 months ago
2
PaddlePaddle/Paddle2ONNX #1218

PaddleSeg-release-2.8.1 的量化模型转onnx onnx转rknn 没有成功的问题

问题1：我使用PaddleSeg-release-2.8.1 的方式进行训练自己的数据集，然进行感知量化训练在PaddleSeg-release-2.8.1/deploy/slim/quant文件下进行了训练动态转静态。然后使用 paddle2onnx的静态文件转onnx文件没有生成 onnx为文件 (PaddleSeg) D:\PY\PaddleSeg-rele…

1314520gu updated 6 months ago
2
unslothai/unsloth #842

inference not respond with finetuned llama 3.1 8B bnb 4 bits…

I refined llama3.1 8b bnb 4bits according to your recommendations with my own train+eval dataset and saved as merged 16 bits. I now want to create an inference by loading the 16b merged model and usin…

dromeuf updated 5 hours ago
7
THUDM/VisualGLM-6B #155

运行web_demo报错

File "web_demo.py", line 129, in main(args) File "web_demo.py", line 83, in main model, tokenizer = get_infer_setting(gpu_device=0, quant=args.quant) File "/opt/model/infer_util.py",…

buzhihuoyefeng updated 1 year ago
3
guidance-ai/guidance #581

Dolphin Mistral Prompt blows up on assertion

I'm trying to apply dolphin mistral's prompt template format: system {system_prompt} user {user_prompt} assistant I've tried this a couple of different ways: quant_path = "TheBloke/dolphi…

chris-cortner updated 9 months ago
6

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant