GPUOpen-LibrariesAndSDKs / Orochi

MIT License
204 stars 32 forks source link

Fail running test64 app on RX Vega on Windows.. #17

Closed oscarbg closed 2 years ago

oscarbg commented 2 years ago

Hi, great project.. tested on a system with a NV GPU and AMD GPU and on CUDA (Nvidia) works great.. using up to date (lastest commit: https://github.com/GPUOpen-LibrariesAndSDKs/Orochi/commit/2b8da8c20139daa7f92ea711767b15778204d1a4) on RX Vega running test64 sample (DeviceEnum show same error) with HIP backend I get:

executing on hip testing initialization testing device props executing on Radeon RX Vega (gfx900:xnack-) testing kernel execution Hostcall: invalid service request. done

error is in :"Hostcall: invalid service request." and debugging shows is due to using printf in kernel code: "{ int a = threadIdx.x; printf(\" thread %d running\n\", a); }"; commenting printf runs fine: "{ int a = threadIdx.x; / printf(\" thread %d running\n\", a); / }";

note I'm on latest AMD 22.3.2 driver (21.50 branch) but being a Windows Insider I got a new driver branch driver (22.10) 22.10.00.22 30.0.21000.22018 run on this driver and also gets same error..

Question is: printf calls in HIP kernels is not supported on Vega cards,and only on newer RDNA/RDNA2 GPUs or it's a bug and should be fixed eventually? thanks..

jammm commented 2 years ago

EDIT: Let me check this on a vega machine to see if I can reproduce this and get back to you.

Hey there! Thanks for checking out our project, it's great to hear positive feedback :) I don't think we officially support Vega on Windows HIP. AFAIK, at the moment we officially support HIP on Windows on newer RDNA2 GPUs on the latest driver. This could explain why printf isn't working on Vega. I'm also not sure if this will be eventually fixed or not. Having said that, you could certainly raise an issue at https://github.com/ROCm-Developer-Tools/HIP and ask them if they're willing to fix this issue on Windows. Hope that helps!

Cheers, Jam

jammm commented 2 years ago

Hey @oscarbg ,

We tried on a PRO wx9100 GPU (also a gfx900) and it seems to run fine:

>> executing on hip
>> testing initialization
>> testing device props
executing on Radeon (TM) Pro WX 9100 (gfx900:xnack+)
>> testing kernel execution
thread 0 running
thread 1 running
thread 2 running
thread 3 running
thread 4 running
thread 5 running
thread 6 running
thread 7 running
thread 8 running
thread 9 running
thread 10 running
thread 11 running
thread 12 running
thread 13 running
thread 14 running
thread 15 running
thread 16 running
thread 17 running
thread 18 running
thread 19 running
thread 20 running
thread 21 running
thread 22 running
thread 23 running
thread 24 running
thread 25 running
thread 26 running
thread 27 running
thread 28 running
thread 29 running
thread 30 running
thread 31 running

This was running on the 21.40 beta for blender 3.0 driver.

Also tested on RX Vega:

>> executing on hip
>> testing initialization
>> testing device props
executing on Radeon RX Vega (gfx900:xnack-)
>> testing kernel execution
thread 0 running
thread 1 running
thread 2 running
thread 3 running
thread 4 running
thread 5 running
thread 6 running
thread 7 running
thread 8 running
thread 9 running
thread 10 running
thread 11 running
thread 12 running
thread 13 running
thread 14 running
thread 15 running
thread 16 running
thread 17 running
thread 18 running
thread 19 running
thread 20 running
thread 21 running
thread 22 running
thread 23 running
thread 24 running
thread 25 running
thread 26 running
thread 27 running
thread 28 running
thread 29 running
thread 30 running
thread 31 running
>> done

This was on the 22.3.1 driver.

I'm unable to reproduce the issue you're facing. Can you provide more details about your environment? Is it a VM?

ableeker commented 2 years ago

I can confirm that om my machine with an Vega 64 this test runs successfully without any changes.

jammm commented 2 years ago

@oscarbg can you try the latest main branch? We removed the printf call from there so it shouldn't fail this time I believe. Feel free to re-open this ticket if the tests still fail for you.

oscarbg commented 2 years ago

thanks @jammm, but taken a quick look and test and the printf hasn't been removed from /test/DeviceEnum/main.cpp still have to comment:

const char code = "extern \"C\" global " "void testKernel()" "{ int a = threadIdx.x; / printf(\" thread %d running\n\", a); */ }"; const char funcName = "testKernel";

jammm commented 2 years ago

@oscarbg you're right. I fixed the UnitTest and not this particular test. Let me do just that. Thanks!

jammm commented 2 years ago

Merged #37