-
我看了下代码,我的理解是prefill做的是预处理部分的工作,主要的推理是在decode部分完成,为什么代码里面是把prefill放在了npu上去执行,而重要的decode阶段要放在CPU上去执行?
-
It would be great to see if AI Toolkit can leverage the NPU in Copilot PCs.
Currently this uses the CPU, its nice a quick on the Snapdragon processors but not using the AI processor when running mod…
-
Hi,
Could you share the sample Android App to run Llama-v2-7B-Chat Quantized INT4 on my Android Device ?
your sample "python -m qai_hub_models.models.llama_v2_7b_chat_quantized.export"
generate…
-
Greetings,
I'm currently trying to compile the following model:
```python
import brevitas.nn as qnn
import torch.nn as nn
from brevitas.quant import (
Int8ActPerTensorFixedPoint as ActQu…
-
Does QNN support input of dynamic dimensions, such as the width and height of input pictures are usually not fixed in super-resolution models. Any plans to implement this feature?
ONNX runtime supp…
-
### 🐛 Describe the bug
setup-with-qnn.sh
### Versions
examples/demo-apps/android/LlamaDemo/setup-with-qnn.sh: 68: pushd: not found
while running script getting specific error?
-
### The issue
I have compiled onnxruntime with qnn backend support for arm64-v8a android api 34. My final goal is to use this in sherpa onnx (k2) encoder decoder model to run in qnn htp backend. B…
-
**Describe the bug**
Simulations with high order QNN Pennylane derivatives with shots (dfdxdp, dfdxdx, dfdxdop) give different results when comparing to qiskit_shots, qiskit_statevector and penylan…
rupof updated
3 months ago
-
Hi,
mllm-qnn can work on my device oppo findx7 ultra(snapdragon 8gen 3+16G RAM).
However, the prefill speed for Qwen1.5-1.8B is approximately 4-6 tokens per second, which significantly diverges fro…
-
May make tensor_filter::qnn? or support it as other filter's delegate (tflite / onnxruntime / executorch)
QNN: The Qualcomm® AI Engine Direct SDK provides lower-level, unified APIs for AI development…