amd / RyzenAI-SW

MIT License
363 stars 59 forks source link

How do I know if the IPU device is employed ? #21

Open ocwins opened 10 months ago

ocwins commented 10 months ago

I have succeeded run the demo "ipu_modelsx4_demo", but whether I turn the IPU device enabled or disabled in device manager, the demo runs flawless, and I can't tell if there are any difference.

I suggest that releasing a single executable file what runs some simple tests to make sure everything related to IPU are good or not. It's better if there is a demo do not need any environment setups (conda, python, etc.), then everybody can use it to test their hardware, but not only developers.

Another suggestion is examples in pure C/C++ and other low-level tools. In our recent projects, we uses pure C/C++/cuda for inference. To be honest, that makes life much easier. With Ryzen AI, we still don't want to employ any sophisticated solutions. But we need the low-level interfaces/tools and examples how to use them.

A project like cutlas from NVIDIA is a good show. We don't use it, but from their codes, we can easily learn how to using their hardware effectively and efficiently.

uday610 commented 10 months ago

@ocwins , thanks for offering these suggestions. Let us review your suggestions internally.

The demo is possibly running on CPU when you disabled the IPU device.

ocwins commented 10 months ago

@ocwins , thanks for offering these suggestions. Let us review your suggestions internally.

The demo is possibly running on CPU when you disabled the IPU device.

A program shows detail info and if it can do some benchmark may be the best. (with or without source code)

And at moment, the demo, it seems that the fps/cpu usage do not have significant changes when I disabled the IPU device. How can I know if the IPU is properly employed ? Are there some outputs can be used to distinguish?

ocwins commented 10 months ago

The demo is possibly running on CPU when you disabled the IPU device.

I did some more investigation and it seems that the demo " "ipu_modelsx4_demo"" always run on CPU regardless of whether the IPU device is enabled or not.

Is this file useful?

vitisai_ep_report.json

rejectcookies commented 10 months ago

@uday610 Please have the team view this https://www.asrock.com/microsite/aiquickset/index.html

ASRock has made a simple to use application that they have decided to restrict the use of to their GPUs unfortunately, but it is exactly what a consumer friendly Ryzen AI app should be. I just want to install an app and start prompting images, audio and text. :)

ocwins commented 10 months ago

@uday610 Please have the team view this https://www.asrock.com/microsite/aiquickset/index.html

ASRock has made a simple to use application that they have decided to restrict the use of to their GPUs unfortunately, but it is exactly what a consumer friendly Ryzen AI app should be. I just want to install an app and start prompting images, audio and text. :)

In my opinion, we should not go so far at current stage.

First, we need a program to test and benchmark our hardware. A simple mma (matrix multiplication and accumulation) demo with correctness check could achieve this goal.

In my current understanding, IPU is a hardware which runs binaries, AMD provided two binaries at moment, 1x4 and 5x4. Running 1x4 binary on IPU could provide 2 TOPS computing power for a single application/program and users could run 5 application employs IPU at same time. 5x4 binary could provide 10 TOPS computing power for a single application and users could run only one application at same time.

A demo only do arithmetic without any other business logic could prove that the IPU works well and can be pushed to its max computing power. And with source codes of a demo like this, developers especially real programmers could learn the way how to use IPU with its full capability.

Personally, I do not like those onnx (and other frameworks) codes. I want to know how to directly operate the IPU, how to submit inputs and where to get the results. If we could learn how to make a binary running on IPU, it would be perfect.

AI is a set of apps but not the foundation making AI works, programs/examples/frameworks of/for AI is too far from the hardware. Something simple but using IPU through onnx or vitisai(?) are somehow a bit closer, but still not suitable to test/demonstrate the IPU itself, examples of running LLM are too heavy for the purpose, that purpose is making users/developers familiar to the hardware.

I know companies have their strategy and policy for a certain period. The decision maker may not have willing to give developers (outside the company) all things under the hood.

What I can say is that a demo/example as small as possible (under the restriction from company), it could tell us if the hardware works properly and if the max computing power can be achieved, a program like this is necessary at this very early stage.

A tool like this is good to the team of RyzenAI-SW too. When users/developers meet troubles, you can ask them running this tool to prove their environments are configured properly. @uday610, @andyluo7

Some apps show what IPU can do at high-level are good, but a tool checks the hardware and its capability is a must. That is my opinion.

rejectcookies commented 10 months ago

https://www.youtube.com/watch?v=IVPT6scMaaw

that publication date current stage

If you only knew how bad things really are.

ocwins commented 10 months ago

https://www.youtube.com/watch?v=IVPT6scMaaw

that publication date current stage

If you only knew how bad things really are.

What should I say...

" A thousand mile trip begins with one step. " ;p

As far as I can see, from current stage to a stage being totally usable and useful is not too far. Many works after current stage have been done, the trouble comes from that some works at (or before) current stage have not or not done well.

uday610 commented 10 months ago

Hi @ocwins The multi-model demo is just updated https://github.com/amd/RyzenAI-SW/pull/27 , you may try the same. Regarding your other suggestion, the team is developing a low-level utility/tool to interact with the IPU standalone/independently. Hopefully we will be able to provide an early access release soon, stay tuned.

Thank you,

ocwins commented 10 months ago

Hi @ocwins The multi-model demo is just updated #27 , you may try the same. Regarding your other suggestion, the team is developing a low-level utility/tool to interact with the IPU standalone/independently. Hopefully we will be able to provide an early access release soon, stay tuned.

Thank you,

About my suggestion, thank you (and the team) for listening.

About the multi-model demo, it is still problematic. there are typos in generate_script.py:

bat_file =["set XLNX_VART_FIRMWARE=%cd%\\..\\1x4.xclbin\n",
           "set PATH=%cd%\\..\\bin;%cd%\\..\\python;%cd%\\..;%PATH%\n" # MISSING: comma at end
           "set PYTHONPATH="+pythonpath_value+";%PYTHONPATH%" # MISSING: \n and comma
           "set DEBUG_ONNX_TASK=0\n",
           "set DEBUG_DEMO=0\n",
           "set NUM_OF_DPU_RUNNERS=4\n",
           "set XLNX_ENABLE_GRAPH_ENGINE_PAD=1\n",
           "set XLNX_ENABLE_GRAPH_ENGINE_DEPAD=1\n",
           "%cd%\\..\\bin\ipu_multi_models.exe %cd%\\config\\"]

With corrected scripts, ipu_multi_models.exe pops a message box complaining of "glog.dll was not found" and fails to run. The old version didn't have this problem.

lidachang-amd commented 10 months ago

hi @ocwins. in my conda environment, the glog.dll was installed along with Anaconda3. Could you please try running conda install glog and then execute the program again? BTW, my anaconda installer version is Anaconda3-2023.07-2-Windows-x86_64.

ocwins commented 10 months ago

Hi @lidachang-amd ,

the glog problem is resolved by installing it.

But there is a new problem, console output: FAIL : LoadLibrary failed with error 127 "" when trying to load "C:\RyzenAI\SW\demo\multi-model-exec\bin\onnxruntime_vitisai_ep.dll" and there is a message box showing something like that.

The demo works if we replace onnxruntime_vitisai_ep.dll by its previous version. But like previous version, it also runs well without IPU (disabled in device manager), and the CPU usage is as same as the IPU is enabled.

So it's hardly to tell if the IPU was employed in these experiments.

lidachang-amd commented 10 months ago

Did You delete the cache (C:\temp{User_name}\vaip.cache) firstly, or you can set XLNX_ENABLE_CACHE=0 to disable the cache.

ocwins commented 10 months ago

Did You delete the cache (C:\temp{User_name}\vaip.cache) firstly, or you can set XLNX_ENABLE_CACHE=0 to disable the cache.

No difference is observed after deleting cache or set XLNX_ENABLE_CACHE.

BTW, onnxruntime_vitisai_ep.dll provided in this version may be not properly compiled. The message box shows that the entry point could not be found. There could be a name mismatch caused by C++ name mangling.