-
### Describe the issue
With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroyin…
-
A repetitive issue I see coming up over and over again is people not being able to run models on their hardware for any number of reasons, one of the biggest being that llama.cpp has not incorporated …
-
I am currently using Surface Pro 11 to reproduce [AIPC_Inference.md#2-use-directml--onnx-runtime-to-run-phi-3-model](https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/AIPC_Inference…
-
**System Information (please complete the following information):**
- Model Builder Version 17.18.4.2425601:
- Visual Studio Version 17.11.4
**Describe the bug**
Starting today i get an excep…
-
I am trying to export a custom GeoCalib model to ONNX.
The model uses LMOptimizer, which accepts a custom class Pinhole (inherited from BaseCamera) as input. When I attempt to export the model, I en…
-
### Describe the issue
The EP_CTX_BLOB (The compiled model saved as an ONNX blob on the disk)
seems to have WRITE and EXECUTABLE permissions enabled.
Since a compiled blob is meant to be read from, I…
-
### Describe the issue
I have an FP16 (half-precision floating point) ONNX model. When I load and execute this model using the onnxruntime library in Python, the first execution is successful and pro…
-
### Describe the issue
The preprocess step for quantization does not work with the latest onnxruntime version:
```shell
python -m onnxruntime.quantization.preprocess --input image_resize.onnx --outp…
maaft updated
3 weeks ago
-
**Description**
When deploying an ONNX model using the Triton Inference Server's ONNX runtime backend, the inference performance on the CPU is noticeably slower compared to running the same model usi…
-
### Describe the issue
I encountered the following error when loading a model with dynamic shapes using the QNN Provider as the backend acceleration setting.
![Image](https://github.com/user-attach…