ROCm / clr

MIT License
85 stars 35 forks source link

Fix segfault with -O0-compiled kernels #13

Open ldrumm opened 9 months ago

ldrumm commented 9 months ago

kernels_[blitType] yields a null function pointer in KernelBlitManager::initHeap due to KernelBlitManager::createProgram not initializing all the kernels because of broken layout invariants.

KernBlitManager::NumBlitKernels has two possible return values: BlitTotal, and BlitLinearTotal which are sentinels. These sentinels are used in KernelBlitManager::createProgram to initialize the kernels_ array. It correctly initializes [0, NumBlitKernels()), but InitHeap is > NumBlitKernels, so the InitHeap kernel is not loaded.

Thus, when images are disabled, and a kernel has an hidden_heap_v1 entry, the InitHeap blitkernel is not loaded, and kernelBlitManager::initHeap subsequently gets a null pointer. The kernel always has such hidden_heap_v1 descriptor entry at -O0, but I believe it's possible (unconfirmed) for this situation to occur in other circumstances.

The fix is simply to ensure the InitHeap enumeration has a numeric value less than BlitLinearTotal.

n.b. This bug does not exist on the 5.7 release, but it looks like that's by chance.

cjatin commented 9 months ago

cc'ing: @gandryey @chrispaquot @saleelk