CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
226 stars 34 forks source link

ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY with immediate command lists on iGPU #652

Closed pjaaskel closed 1 year ago

pjaaskel commented 1 year ago

Various tests fail on my iGPU with the out of memory error now with the ICL enabled by default. For example:

361/994 Test #359: Unit_hipGraphAddEventRecordNode_MultipleRun ...............................***Failed    8.98 sec
CHIP error [TID 227578] [1697459999.905247949] : hipErrorTbd (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:365:recordStream
...
406/994 Test #405: Unit_hipStreamBeginCapture_BasicFunctional ................................***Failed    8.87 sec
CHIP error [TID 227854] [1697460021.111924867] : hipErrorNotInitialized (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:1478:memCopyAsyncImpl

CHIP error [TID 227854] [1697460021.112094190] : Caught Error: hipErrorNotInitialized
Filters: Unit_hipStreamBeginCapture_BasicFunctional
...
561/994 Test #533: Unit_hipMallocPitch_ValidatePitch .........................................***Failed   38.60 sec
CHIP error [TID 228750] [1697460077.458608272] : hipErrorNotInitialized (ZE_RESULT_ERROR_OUT_OF_DEVICE_MEMORY ) in /home/pjaaskel/src/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:1478:memCopyAsyncImpl

CHIP error [TID 228750] [1697460077.458833774] : Caught Error: hipErrorNotInitialized
Filters: Unit_hipMallocPitch_ValidatePitch
...
pvelesko commented 1 year ago

iGPU and ICL do not work well together. Not supported.

pjaaskel commented 1 year ago

This will be addressed in #650 by not enabling ICL by default on iGPU.

pvelesko commented 1 year ago

@pjaaskel can you re-test on your system now that #665 is merged?

pjaaskel commented 1 year ago

export CHIP_L0_IMM_CMD_LISTS=OFF or ON makes no difference to the test results (at least with --num-threads=1), so I suppose it's fixed?

pvelesko commented 1 year ago

Was it running out of memory previously also using 1 thread?

pjaaskel commented 1 year ago

Yes.