-
Hi,
I'm using your repo to run inference on live camera stream with YoloV5 (Jetson Nano - DeepStream-6).
The engine work well using FP32 or FP16.
However, I'm trying to convert the model with in…
-
```
# bits, mode = (8, 'kmeans_lut') if int8 else (16, 'linear') if half else (32, None)
# if bits < 32:
# if MACOS: # quantization only supported on macOS
# with warning…
-
I am encountering a data type mismatch error when using 8-bit quantization with the PEFT library and SFTTrainer for fine-tuning a language model. The error occurs during the generation phase after loa…
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Microsoft Windows 11 Home 10.0.22631 version 22631 compilation
- TensorFlow installation (pip package or built…
-
RTX 4090 24G,
Qwen-7B-Chat
loads OK:
```
model_config = ModelConfig(lora_infos={
"lora_1": conf['lora_1'],
"lora_2": conf['lora_2'],
})
model = ModelFactory.from_huggingface(conf['b…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and fou…
-
### OpenVINO Version
openvino2022.3
### Operating System
Windows System
### Device used for inference
CPU
### Framework
ONNX
### Model used
yolov8
### Issue description
I have a picture tha…
-
TensorRT-LLM has great potential for allowing people to run larger models efficiently with limited hardware resources. Unfortunately, the current quantization workflow requires significant computation…
-
May I ask whether the current project supports INT8 quantization? If so, how? Currently onlyFT16, FT32 quantification is supported, right?
-
## detail | 详细描述 | 詳細な説明
1.模型量化后bin size缩小了一倍,但是从float32->int8应该是缩小为原来的四分之一才对呀,有点奇怪,请大佬解惑。
2.量化后模型推理变慢了10%(cotex x2 thread number 4),为什么会变慢呢,我使用的模型是yolov5 nano x0.5,请大佬解惑。