Open Montzsuma opened 1 day ago
Hi @Montzsuma,
Yes, The software is tuned to the real models. I suggest you try with actual CNN models with convolution layers (ResNet, MobileNet etc). It is not for experimenting with one operator, etc.
Thanks Uday
Hi @uday610 !
Thanks for the quick reply.
Do you know how does RyzenAI decides if will run on the NPU? You mentioned that one operator alone won't run on the NPU, but is there a specific amount of operations that the software checks for before deciding where it will run? For example, would recursively running MatMul a high number of times be enough for it to run on the NPU, or only very specific operations do?
In addition, I saw this link, specifically the "This graph partitioning and deployment technique across CPU and NPU is fully automated by the VAI EP and is totally transparent to the end-user." part. Does it means that the execution of a single model can have its load shared between CPU/NPU?
Thanks a lot!
Ryzen AI's official installer flow supports CNN-based models, so the model must have a few convolution (I think at least 2) layers.
Yes, the execution of a single model can be shared between NPU and CPU, depending on the operators. If some operators cannot run on NPU, they will run on CPU, and this happens automatically.
Then, if I'm understanding it right, I can't run single operator models in the NPU, and if the model is complex enough to run on NPU, there is no guarantee that everything in the model will actually run on the NPU, even if only operations listed in the Model Compatibility page are used, because the framework can decide to optimize the performance and share the load between CPU/NPU. Is that so?
What I was actually trying was to benchmark single operators performance on the NPU. Basically run the operators in the previous link with varying input sizes and compare performance between CPU and NPU.
Hi all,
I modified the hello_world model to perform a single MatMul operation instead of the Conv2d/Relu operations, and i'm unable to make it run on the NPU.
The code is mostly the same, the changes were mostly in the model:
the dummy inputs:
The actual inputs:
I am also looping the inference a few hundred times to check the NPU usage.
I also tried to add to the vaip_config.json:
Which, being honest, ChatGPT brought me these properties, but i'm not sure from where they are, and couldn't find them in the documentation.
Output:
Is there a limitation on simpler models, or is some property missing?
I also just noticed that the quantized model does'nt actually return the matrix multiplication, just a random array everytime I run the script.