-
I made some sample code to show how to use the NPU to run a yolo model on mp4 files.
Currently it runs in real time on my Snapdragon X Elite Dev Box. Twice as fast as the Yolov4 GPU DirectML sample w…
-
### Describe the issue
When running this:
```python
import os
def quantize_onnx_model(onnx_model_path, quantized_model_path):
from onnxruntime.quantization import quantize_dynamic, QuantType
…
-
### Is there an existing issue for this problem?
- [X] I have searched the existing issues
### Operating system
Windows
### GPU vendor
Nvidia (CUDA)
### GPU model
RTX 3060
### GPU VRAM
12GB
…
-
### Describe the issue
I am trying to load XGBoost onnx models using onnxruntime on Windows machine.
The model size is 52 mb and the RAM it is consuming on loading is 1378.9 MB. The time to load …
-
### 🐛 Describe the bug
```torch.onnx.errors.SymbolicValueError: ONNX symbolic expected the output of `%2212 : Tensor = onnx::Squeeze(%2186, %2211), scope: SimpleLSTMNet::/torch.ao.nn.quantized.modu…
-
**Is your feature request related to a problem? Please describe.**
The famous Phi-3 series models offer SOTA performance, especially in reasoning, math and coding. Microsoft released the models under…
-
Here is cmd's / code to reproduce:
To generate llama3 opset 20 onnx model:
```
pip install optimum[exporters]
huggingface-cli login
optimum-cli export onnx --model meta-llama/Meta-Llama-3-8B-In…
-
### System Info
tranformers v2.17.2
node v18.20.3
### Environment/Platform
- [ ] Website/web-app
- [ ] Browser extension
- [X] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Elect…
-
### Model description
[jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1/tree/main/onnx)
### Prerequisites
- [X] The model is supported in Transformers (i.e., listed [here](https://hu…
do-me updated
4 months ago
-
I have converted google[ flan-t5-small](https://huggingface.co/google/flan-t5-small) using `fastT5.export_and_get_onnx_model` method with quantization enabled by defaults:
```python
import sys, os…