Xilinx / inference-server

https://xilinx.github.io/inference-server/
Apache License 2.0
43 stars 13 forks source link

Error with resnet50 example on V70 #205

Open varunsh-xilinx opened 1 year ago

varunsh-xilinx commented 1 year ago

Describe the bug Running the resnet50 example on V70 results in an error.

To Reproduce Steps to reproduce the behavior:

  1. Install a V70 resnet50 model
  2. Run the example script and pass the model

Expected behavior The example script should work on V70.

Screenshots

WARNING: Logging before InitGoogleLogging() is written to STDERR
F20230615 07:55:15.982091  2202 dpu_runner.cpp:207] [UNILOG][FATAL][VART_RUNNER_CONSTRUCTION_FAIL][Cannot create runner] cannot open library! lib=libvart-dpu-runner.so, error=ERROR CODE: libvart-dpu-runner.so: cannot open shared object file: No such file or directory
*** Check failure stack trace: ***
./run.sh: line 33:  2167 Aborted                 (core dumped) python3 vitis.py --model ../../resnet_v1_50_tf/resnet_v1_50_tf.xmodel
make: *** [run] Error 134
Mahdi-CV commented 1 year ago

I am working on the exact same issue. I am trying to create dev docker for Vitis + TF Zendnn + PT Zendnn. I ran into the issue mentioned here but adding precompiled VART libraries from Vitis Docker got passed this issue, however, running into a new issue right now:

Running the Vitis example for ResNet50 in Python No server detected. Starting locally... Waiting until the server is ready... ^[[ALoading worker... WARNING: Logging before InitGoogleLogging() is written to STDERR F20230808 17:45:01.050848 8074 serialize_v2.cpp:705] [UNILOG][FATAL][XIR_READ_PB_FAILURE][failed to read pb file] file = ../../external/artifacts/resnet50/resnet_v1_50_tf. Check failure stack trace: ^CAborted (core dumped)

PS: all the steps that I have followed are documented here: (all the steps I took are documented here: https://auperatechvancouver.sharepoint.com/:w:/g/EVfnCjbCivdOl91PuOdioOEBI_0lMOcHFrW266pPlJsOkQ?e=iyEXkr

varunsh-xilinx commented 1 year ago

@Mahdi-CV, as a workaround, try this:

Replace these lines with:

vcpkg_cmake_configure(
  SOURCE_PATH ${SOURCE_PATH}/src/vai_runtime/vart
  OPTIONS
    -DENABLE_DPU_RUNNER=ON
)

And then rebuild the server.

Mahdi-CV commented 1 year ago

thanks @varunsh-xilinx, that got rid of the error I was facing. However, now I run into another error that I don't have any clue what to do with. This is the error: F20230810 14:52:58.087687 4536 buffer_object_xrt_imp.cpp:78] Check failed: bo_ != XRT_NULL_BO allocation failure: xrt_.handle 0x7fb1640e7f70 xrt_.device_id 0 size 2048 xrt_.flags 0x1000000 *** Check failure stack trace: ***

checking with dmesg it shows this: [ 4676.517264] xocl 0000:0b:00.1: ffff9cbd3ca3a0b0 check_bo_user_reqs: Bank 0 is marked as unused in axlf [ 4676.518499] [drm:__xocl_create_bo_ioctl [xocl]] *ERROR* object creation failed idx 0, size 0x800 Note that I have another docker container that runs Vitis AI examples successfully on my V70 card. then I checked the XRT version of amdinference docker and it matches the host and also it matches the version in my other Vitis AI docker which is based on on Vitis AI repo. Any idea about the error I am getting?

varunsh-xilinx commented 1 year ago

Unfortunately, I do not. It could be related to the VAI runtime code (e.g. vart and others) in the container. Maybe the installed versions don't support V70 but that's just a guess.

Mahdi-CV commented 1 year ago

I am still stuck with the same issue. The installed versions that come from the dependencies inference server seem to be Vitis 3.0 which should support v70. I am just curious to know if you were able to successfully run resnet50 example on V70?

varunsh-xilinx commented 1 year ago

Sorry, I have not tried it with a V70 beyond seeing the original error on this issue.

varunsh-xilinx commented 1 year ago

@Mahdi-CV, please try with a new container from the repo if you can. I was able to run the resnet50 example on V70 with it