-
### Issue type
Feature Request
### Have you reproduced the bug with TensorFlow Nightly?
No
### Source
binary
### TensorFlow version
v2.13.0-17-gf841394b1b7
### Custom code
No
### OS platform…
-
### Describe the issue
I quantized a simple CNN model in Pytorch and converted it to onnx. When I tested the runtime of int8 model and fp32 model on CPU, the int8 model was slower. Here my code:
[Go…
-
Nils is it possible to create an integer only models so this could run on accelerators or frameworks such as ArmNN?
https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer…
-
[Info] You are exporting PPQ Graph to TensorRT(Onnx + Json).
Please Compile the TensorRT INT8 engine manually:
from ppq.utils.TensorRTUtil import build_engine
build_engine(onnx_file='Quantized…
-
Are there any runnable demos of using Sparse-QAT/PTQ (2:4) to accelerate inference, such as applying PTQ to a 2:4 sparse LLaMA for inference acceleration? I am curious about the potential speedup rati…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussi…
-
**Describe the solution you'd like**
I found that the latest release of tensorrt 8.0 is support for the int8 quantization on GPU, which is great accelerate inference speed.
And now onnxruntime is …
-
### 💡 Your Question
Hi,
I am just checking, I see in the provided results that Yolo-NAS-L does not suffer much reduction in performance going to Yolo-NAS-INT8-L. Can I check what exactly is meant …
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and found no similar bug report.
### Ultralytics YOLO Component
Val
…
-
模型加載大概占用5G,來回的對話幾次後,就跳到6G,增加一次對話大概增加300MB記憶體,請問有辦法克服這個問題嗎?
==============================
python realtime_chat.py --role_name 三三
-----PERFORM NORM HEAD
user:你好
/home/allen/miniconda3/envs/index…