-
### Describe the issue
The DequantizeLinear, pad, and QuantizeLinear operations in the statically quantized model using the optimization level ORT_ENABLE_EXTENDED are not fused into one operation. My…
-
### 🚀 The feature, motivation and pitch
I have recently been exploring the `torch.export`-based quantization and encountered significant slow-downs in inference performance, particularly with per-cha…
-
### Problem Description
runTrace.sh the vLLM benchmark failed
### Operating System
Ubuntu22.04 in the docker image rocm/vllm-dev:20241025-tuned
### CPU
AMD EPYC 9654 96-Core Processor
### GPU
A…
-
### System Info
Hello I am trying to load Mistral-Nemo Instruct-2407 in bnb 4bit on 4 A10 gpus on ec2 instance.
I upgraded all the packages.
Still I face cuda memory out of error when train batc…
-
Dear Developers,
I am very new to Tensorrt and quantization. Previously I only use the basic example of Tensorrt to generate engines in FP16 because I thought INT8 will compromise accuracy signific…
-
### System Info
```shell
platform: Linux Ubuntu Server 20.04 x64
Python 3.8.10 (default, May 26 2023, 14:05:08)
[GCC 9.4.0] on linux
python packages:
Package Version …
-
**System information**
TensorFlow version (you are using): TF 2.13.0
Are you willing to contribute it (Yes/No): No
Describe the feature and the current behavior/state.
Dear TF developers, I'm …
-
(venv) PS D:\python\LangChain-ChatGLM-Webui-master> python app.py
No sentence-transformers model found with name C:\Users\Administrator/.cache\torch\sentence_transformers\GanymedeNil_text2vec-base-ch…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…
-
With the release of the new [Mistral NeMo 12B model](https://mistral.ai/news/mistral-nemo/) we now have weights that were pre-trained with FP8. It would be great if Unsloth could support 8bit as well …
rwl4 updated
3 months ago