VeriSilicon / TIM-VX

VeriSilicon Tensor Interface Module
Other
221 stars 84 forks source link

Failed to compile vx shader Cannot find the header file cl_viv_vx_ext.h #614

Closed burtbick closed 1 year ago

burtbick commented 1 year ago

After having resolved an issue with not being able to process yolov5 model output we received a patch from NXP to fix that.

But there is another issue and NXP pointed me to an issue here, but that issue really doesn't clarify what is really causing the issue and refers to pointing to the pre-build x86 SDK tree. Which makes 0 sense for an application that is built to run on ARM.

Here's the failure that we get when kicking off an inference from our C++ application:

(266:0) : error : Error(0,266) : Cannot find the header file cl_viv_vx_ext.h. (25:0) : error : undefined identifier: 'COPY' (59:0) : error : undefined identifier: 'COPY' (268:0) : error : syntax error at 'VXC_512Bits'

ERROR: Failed to compile vx shader. (error: FFFFFFFF) E [kernel/vsi_nn_kernel.c:_gpu_register:600]Build program fail. E [kernel/vsi_nn_kernel.c:vsi_nn_kernel_create_node:874]Register client kernel com.vivantecorp.extension.evis.resizenearest. (266:0) : error : Error(0,266) : Cannot find the header file cl_viv_vx_ext.h. (25:0) : error : undefined identifier: 'COPY' (59:0) : error : undefined identifier: 'COPY' (268:0) : error : syntax error at 'VXC_512Bits'

One thing that NXP said is that TIM-VX will register BOTH GPU and NPU. We currently do not have our build set up to support the GPU, we are wanting to just use the NPU.

So, how do we resolve this issue, and make sure that the inference is only going to use the NPU. I know that there are a couple of environment vars that might aid with this but when I was testing with them It didn't appear to prevent the above error even when setting for the NPU. Basically the same failure as the default of no env var, which is supposed to be the NPU.

It appears that vx-delegate tim-vx is attempting to perform some operation on the GPU, but that is just a guess from remote debugging the execution path.

On the full Yocto build with the specified NXP BSP and Tensorflow Lite versions we do not see this issue, but I suspect that it may be because the GPU is supported in that build.

sunshinemyson commented 1 year ago

@burtbick,

You can fixed by:

  1. export VIVANTE_SDK_DIR=<path to prebuilt-sdk/x86_64_linux> or
  2. copy missing header to the same directory of your test program.
burtbick commented 1 year ago

Thanks,

OK, so we are running on an ARM platform, not x86_64.. And this error happens when attempting to do an inference using vx-delegate with a yolov5 model. This error does NOT happen when cross building the application for the target.

Why would it attempt to build any code during execution? And more importantly since our application is running on a read only file system, where would the result of the compilation be located? Would it be in /tmp?


From: Sven @.> Sent: Tuesday, July 4, 2023 9:56 PM To: VeriSilicon/TIM-VX @.> Cc: burt burtbicksler.com @.>; Mention @.> Subject: Re: [VeriSilicon/TIM-VX] Failed to compile vx shader Cannot find the header file cl_viv_vx_ext.h (Issue #614)

@burtbickhttps://github.com/burtbick,

You can fixed by:

  1. export VIVANTE_SDK_DIR=<path to prebuilt-sdk/x86_64_linux> or
  2. copy missing header to the same directory of your test program.

— Reply to this email directly, view it on GitHubhttps://github.com/VeriSilicon/TIM-VX/issues/614#issuecomment-1620908554, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYT7RKXSMZCBMWHHF7OE5DXOTCWJANCNFSM6AAAAAAZ42AOCE. You are receiving this because you were mentioned.Message ID: @.***>

sunshinemyson commented 1 year ago

Yes, we need to compile some kernel function before inference, the output will be in tmp folder

burtbick commented 1 year ago

OK, thanks.

Just to be clear this will happen for both GPU and NPU, or if we set the env var for the GPU to 0 then it will NOT try to compile at run time?

Thanks again,


From: Sven @.> Sent: Saturday, July 8, 2023 11:43 AM To: VeriSilicon/TIM-VX @.> Cc: burt burtbicksler.com @.>; Mention @.> Subject: Re: [VeriSilicon/TIM-VX] Failed to compile vx shader Cannot find the header file cl_viv_vx_ext.h (Issue #614)

Yes, we need to compile some kernel function before inference, the output will be in tmp folder

— Reply to this email directly, view it on GitHubhttps://github.com/VeriSilicon/TIM-VX/issues/614#issuecomment-1627385152, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYT7ROGM7UKCPMEZJJPDODXPF525ANCNFSM6AAAAAAZ42AOCE. You are receiving this because you were mentioned.Message ID: @.***>

sunshinemyson commented 1 year ago

VIVANTE_SDK_DIR is required for both GPU and NPU.

burtbick commented 1 year ago

Thanks Sven,

One last clarification and I should be done asking questions.

So, I think you are saying that even if we have set the environment variable to not use the GPU the the inference will still attempt to build dynamically even when only using the NPU, is that correct?

Thanks again, Burt


From: Sven @.> Sent: Tuesday, July 11, 2023 9:55 PM To: VeriSilicon/TIM-VX @.> Cc: burt burtbicksler.com @.>; Mention @.> Subject: Re: [VeriSilicon/TIM-VX] Failed to compile vx shader Cannot find the header file cl_viv_vx_ext.h (Issue #614)

VIVANTE_SDK_DIR is required for both GPU and NPU.

— Reply to this email directly, view it on GitHubhttps://github.com/VeriSilicon/TIM-VX/issues/614#issuecomment-1631733419, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYT7RJYRNMN4ADLB65XL5DXPX72DANCNFSM6AAAAAAZ42AOCE. You are receiving this because you were mentioned.Message ID: @.***>

robert-kalmar commented 1 year ago

Hi @burtbick , both GPU and NPU requires building some OpenCL kernels before execution, the env variable (USE_GPU_INFERENCE) influence if the kernel is build for GPU or NPU. To clarify, OpenCL is partially used also inside the NPU. So the fact there is OpenCL build executed does not mean the GPU will be used. The env variable USE_GPU_INFERENCE controls if exclusively the GPU or NPU will be used for model acceleration by the VX Delegate. Run in parallel on both GPU and NPU is not supported.

Next, can you please elaborate on "On the full Yocto build with the specified NXP BSP and Tensorflow Lite versions we do not see this issue, but I suspect that it may be because the GPU is supported in that build." If I understand correctly it works fine with Yocto full image but does not work on your setup. What is your setup look like? To use the NPU you need the GPU driver also. Both GPU and NPU is controlled by the GPU driver.

burtbick commented 1 year ago

Thanks for the clarification,

I didn't build the actual project system, that is being handled by another developer and he was trying to minimize the size of the system. So he didn't include the SDK files, or at least all of them.

To work on the issues we were having with the NPU not processing Yolov5 models properly I did a Yocto build of the full system, and that included all of the required SDK files.

I'll be switching back to the target system build when I'm back on that project in about a week and will do some additional testing. I'll also pass this information on to the other developer so hopefully he will add the missing files back in to our target system build and we should be good to go there.

This information is very helpful.

Thanks again, Burt


From: robert-kalmar @.> Sent: Thursday, July 13, 2023 10:17 AM To: VeriSilicon/TIM-VX @.> Cc: burt burtbicksler.com @.>; Mention @.> Subject: Re: [VeriSilicon/TIM-VX] Failed to compile vx shader Cannot find the header file cl_viv_vx_ext.h (Issue #614)

Hi @burtbickhttps://github.com/burtbick , both GPU and NPU requires building some OpenCL kernels before execution, the env variable (USE_GPU_INFERENCE) influence if the kernel is build for GPU or NPU. To clarify, OpenCL is partially used also inside the NPU. So the fact there is OpenCL build executed does not mean the GPU will be used. The env variable USE_GPU_INFERENCE controls if exclusively the GPU or NPU will be used for model acceleration by the VX Delegate. Run in parallel on both GPU and NPU is not supported.

Next, can you please elaborate on "On the full Yocto build with the specified NXP BSP and Tensorflow Lite versions we do not see this issue, but I suspect that it may be because the GPU is supported in that build." If I understand correctly it works fine with Yocto full image but does not work on your setup. What is your setup look like? To use the NPU you need the GPU driver also. Both GPU and NPU is controlled by the GPU driver.

— Reply to this email directly, view it on GitHubhttps://github.com/VeriSilicon/TIM-VX/issues/614#issuecomment-1634327702, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACYT7RIIKIWTQQNPJALMRSDXP77OJANCNFSM6AAAAAAZ42AOCE. You are receiving this because you were mentioned.Message ID: @.***>