auto-quant Search Results

1000+ results
for auto-quant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

modelscope/ms-swift #1943

swift infer的时候，传递do_sample参数不起作用

**Describe the bug** do_sample 参数，不管是传递--do_sample True 还是 --do_sample true 日志都提示：`do_sample` is set to `False`. However, `temperature` is set to `0.0` swift infer \ --model_type qwen2-vl-2…

baibaiw5 updated 2 months ago
3
vllm-project/vllm #9419

[Bug]: After 0.6.2 update to 0.6.3, INT8(W8A8) format cannot…

### Your current environment The output of `python collect_env.py` ```text (base) root@DESKTOP-PEPA2G9:~# python collect_env.py Collecting environment information... /root/miniconda3/lib/py…

HelloCard updated 3 weeks ago
4
NVIDIA/TensorRT #4058

int8 quantization not work on bert-like embedding model

## Description Hello, I am performing int8 quantization on a BERT-like embedding model. I noticed that after quantization, the inference speed is much more slower than FP16, and the output of the t…

renne444 updated 3 months ago
4
CVHub520/X-AnyLabeling #549

ai自动标注切换模型后不会清理内存

打开图片目录后并开启ai自动标注再进行切换模型后之前的模型还会一直占用内存直到内存耗尽报错 ![2](https://github.com/user-attachments/assets/1f508867-c967-47e6-9190-fe212e16aaef) ![1](https://github.com/user-attachments/assets/09f97ddb-2006-468d…

panda2830 updated 2 months ago
5
pytorch/pytorch #136294

torch inductor fails on vllm model that uses fp8 data types

### 🐛 Describe the bug The following script results in an error when run with vllm [0.6.1-post2](https://github.com/vllm-project/vllm/releases/tag/v0.6.1.post2) and PyTorch 2.4. The model is usi…

bnellnm updated 1 month ago
5
QwenLM/Qwen #1021

[BUG]使用官方profile.py测试qwen-72b-int4模型，速度非常慢，远未达到官方的速度

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this? - [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions ### 该问题是否在FAQ中有解答？ | Is there an existing ans…

ArlanCooper updated 3 months ago
8
AutoGPTQ/AutoGPTQ #584

marlin fail ValueError: `infeatures` must be divisible by 12…

model: qwen1.5 14b chat auto_gptq : 0.8.0dev+cu118,0.7.0dev+cu118 quantize code: ``` quantize_config = BaseQuantizeConfig( bits=4, # quantize model to 4-bit group_size=128, # it is rec…

Minami-su updated 3 months ago
10
ncsoft/offsetbias #2

ValueError: Model architectures ['LlamaForSequenceClassifica…

Hello， I download the model(NCSOFT/Llama-3-OffsetBias-RM-8B) from hugginface。 and then run the code below: ``` pip install -r requirements.txt ``` and then ``` from module import VllmModule …

big-bao updated 2 months ago
1
eosphoros-ai/DB-GPT #1700

[Bug] [Module Name] ValueError: invalid literal for int() wi…

### Search before asking - [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues. ### Operating system information Linux ### P…

heming79 updated 3 months ago
2
BoundaryML/baml #969

Open source models support

Hello, First of all thank you for bringing this amazing tool! I was wondering if there is any chance of integrating open-source LMM models like for example https://huggingface.co/Qwen/Qwen2-VL-7B-…

NikiBase updated 1 month ago
17

上一页 1...72 73 74 75 76 77 78...100 下一页

1000+ results for auto-quant

1000+ results
for auto-quant