-
### OpenVINO Version
2024.0.0
### Operating System
Ubuntu 20.04 (LTS)
### Device used for inference
None
### OpenVINO installation
PyPi
### Programming Language
Python
### Hardware Architect…
-
In 2013, there are two important improvements of Product Quantization. Optimized Product Quantization non-parametric solution [2] was equivalent to the Cartesian k-means [1] and performed better than …
-
os: ubuntu20.04 and MacOS m1 both test
cat foo.py
```
import torch
def bar():
return 3
bar()
```
run this command the `viztracer` will hang
```console
viztracer --log_func_a…
-
I am accelerating a custom pytorch network using Vitis-AI. After following the steps below the model is quantized and the .xmodel is compiled, however the accuracy of the model takes a huge hit going …
-
Hi everyone,
I’m working on a project that involves deploying a YOLOv10 model on a mobile/edge device. To improve inference speed and reduce the model size, I want to convert my YOLOv10 model to Te…
-
Hello, I am trying to implement PTQ(Post training quantization).
Among them, layer fusion is essential to proceed with static quantization.
When using the E2E conformer model of espnet1, conv, line…
-
## Description
I generated calibration cache for Vision Transformer onnx model using EntropyCalibration2 method. When trying to generate engine file using cache file for INT8 precision using trte…
-
### OpenVINO Version
2021.2.1.0
### Operating System
Windows System
### Device used for inference
CPU
### OpenVINO installation
Build from source
### Programming Language
C++
### Hardware Ar…
-
### System Info
- CPU archtecture: x86_64
- CPU/Host memory size: 250GB total
- GPU properties
- GPU name: 2x NVIDIA A100 80GB
- GPU memory size: 160GB total
- Libraries
- tensorrt @ fi…
-
### 실험 계획
- 어떤 tuning 방법을 사용했을때 memory efficient한가?
- 어떤 quantization 방법을 사용했을 때 inference time에서 정확도가 높은가?
#### Finetuning 과정에서 메모리 사용량 비교군
1. Full finetuning
2. LoRA tuning
3. llm.int8() + L…