model-quantization Search Results

1000+ results
for model-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/onnxruntime #21496

[Performance] DequantizeLinear, pad and QuantizeLinear opera…

### Describe the issue The DequantizeLinear, pad, and QuantizeLinear operations in the statically quantized model using the optimization level ORT_ENABLE_EXTENDED are not fused into one operation. My…

flytair updated 3 months ago
5
pytorch/pytorch #116865

Optimizing Per-Channel Quantization for Improved Inference P…

### 🚀 The feature, motivation and pitch I have recently been exploring the `torch.export`-based quantization and encountered significant slow-downs in inference performance, particularly with per-cha…

siahuat0727 updated 10 months ago
9
ROCm/rocmProfileData #71

[Issue]: rPD trace vLLM benchmark failed

### Problem Description runTrace.sh the vLLM benchmark failed ### Operating System Ubuntu22.04 in the docker image rocm/vllm-dev:20241025-tuned ### CPU AMD EPYC 9654 96-Core Processor ### GPU A…

alexhegit updated 3 weeks ago
3
huggingface/trl #2079

Uneven model loading

### System Info Hello I am trying to load Mistral-Nemo Instruct-2407 in bnb 4bit on 4 A10 gpus on ec2 instance. I upgraded all the packages. Still I face cuda memory out of error when train batc…

abpani updated 3 weeks ago
15
NVIDIA/TensorRT #1847

Failed INT8 quantization.

Dear Developers, I am very new to Tensorrt and quantization. Previously I only use the basic example of Tensorrt to generate engines in FP16 because I thought INT8 will compromise accuracy signific…

deephog updated 2 weeks ago
7
huggingface/optimum #1483

Error onnx_proto.TensorProto.FLOAT8E4M3FN: float8e4m3fn

### System Info ```shell platform: Linux Ubuntu Server 20.04 x64 Python 3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] on linux python packages: Package Version …

medphisiker updated 1 month ago
5
tensorflow/tensorflow #61720

Looking for selective post training quantization for 8 bit w…

**System information** TensorFlow version (you are using): TF 2.13.0 Are you willing to contribute it (Yes/No): No Describe the feature and the current behavior/state. Dear TF developers, I'm …

Hrayo712 updated 1 day ago
6
X-D-Lab/LangChain-ChatGLM-Webui #107

本地部署找不到模型

(venv) PS D:\python\LangChain-ChatGLM-Webui-master> python app.py No sentence-transformers model found with name C:\Users\Administrator/.cache\torch\sentence_transformers\GanymedeNil_text2vec-base-ch…

Roy202307 updated 10 months ago
3
vllm-project/vllm #10129

[Bug]: can not serve microsoft/llava-med-v1.5-mistral-7b

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.4.0+cu121 Is debug build: False CUDA used to build PyTorch…

cubense updated 3 weeks ago
1
unslothai/unsloth #776

Support for FP8 quantization

With the release of the new [Mistral NeMo 12B model](https://mistral.ai/news/mistral-nemo/) we now have weights that were pre-trained with FP8. It would be great if Unsloth could support 8bit as well …

rwl4 updated 3 months ago
2

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for model-quantization

1000+ results
for model-quantization