CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
226 stars 34 forks source link

memcpy/memset issues with imm. command lists on iGPU #653

Closed pjaaskel closed 1 year ago

pjaaskel commented 1 year ago

It looks like there is some sort of synch issue that affects some of the memcpy/memset tests. For example:

520/994 Test #521: Unit_hipHostRegister_Memcpy - float .......................................***Failed   10.16 sec
Filters: Unit_hipHostRegister_Memcpy - float

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hipHostRegister is a Catch v2.13.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
Unit_hipHostRegister_Memcpy - float
-------------------------------------------------------------------------------
/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:126
...............................................................................

/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:67: FAILED:
  REQUIRE( Bh[i] == A[i] )
with expansion:
  0.0f == 1.0f
...

521/994 Test #520: Unit_hipHostRegister_Memcpy - int .........................................***Failed   10.48 sec
Filters: Unit_hipHostRegister_Memcpy - int

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hipHostRegister is a Catch v2.13.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
Unit_hipHostRegister_Memcpy - int
-------------------------------------------------------------------------------
/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:126
...............................................................................

/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:67: FAILED:
  REQUIRE( Bh[i] == A[i] )
with expansion:
  0 == 1

...

532/994 Test #522: Unit_hipHostRegister_Memcpy - double ......................................***Failed   14.87 sec
Filters: Unit_hipHostRegister_Memcpy - double

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hipHostRegister is a Catch v2.13.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
Unit_hipHostRegister_Memcpy - double
-------------------------------------------------------------------------------
/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:126
...............................................................................

/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipHostRegister.cc:67: FAILED:
  REQUIRE( Bh[i] == A[i] )
with expansion:
  0.0 == 1.0
...
664/994 Test #665: Unit_hipMemsetFunctional_PartialSet_1D ....................................***Failed    0.99 sec
Filters: Unit_hipMemsetFunctional_PartialSet_1D

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hipMemsetFunctional is a Catch v2.13.4 host application.
Run with -? for options

-------------------------------------------------------------------------------
Unit_hipMemsetFunctional_PartialSet_1D
  hipMemsetD16 - Partial Set
-------------------------------------------------------------------------------
/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipMemsetFunctional.cc:219
...............................................................................

/home/pjaaskel/src/chipStar/HIP/tests/catch/unit/memory/hipMemsetFunctional.cc:55: FAILED:
  REQUIRE( (hostPtr.get()[i] == value) )

This has started to appear after the ICL / last event dep work were pulled in. Perhaps the latter as I (think I) tested the ICL-by-default PR separately.

pvelesko commented 1 year ago

iGPU and ICL do not play well. This is why we don't event test ICL with iGPU

pjaaskel commented 1 year ago

OK, could we automatically disable ICL at runtime if we detect a non-functioning GPU? I just tested and all tests pass with the regular cmd list. Also #652 doesn't appear.

pvelesko commented 1 year ago

@pjaaskel yes, Added TODO https://github.com/CHIP-SPV/chipStar/pull/650

pvelesko commented 1 year ago

ICL does not work well on iGPU due to driver issues but on dGPU ICL works well. We throw a warning when ICL is used on iGPU