ROCm / ROCK-Kernel-Driver

AMDGPU Driver with KFD used by the ROCm project. Also contains the current Linux Kernel that matches this base driver
Other
333 stars 101 forks source link

drm/amdkfd: fix the incorrect exception handling logic in function amd_acquire() #176

Open zxpdemonio opened 3 weeks ago

zxpdemonio commented 3 weeks ago

In function amd_acquire(), kfd_get_process() is call to get process. When judge whether we get an exception pointer, we shouldn't judge whether it's a null pointer, because kfd_get_process will return ERR_PTR(-EINVAL) instead of null pointer if error.

Because of this wrong logic, the kernel will panic then once kfd_get_process() returns ERR_PTR(-EINVAL).

So, the correct logic should be: if (IS_ERR(p)) {

Fixes: commit 779b4d05a1c9("drm/amdkfd: Add RDMA and PeerDirect support") fixes: #175

hkasivis commented 3 weeks ago

Looks good. We will merge it soon.

whchung commented 1 week ago

Is the patch submitted to upstream? I couldn't find it at: https://lists.freedesktop.org/archives/amd-gfx/2024-October/ https://lists.freedesktop.org/archives/amd-gfx/2024-November/

kentrussell commented 1 week ago

PeerDirect isn't upstreamable, so it cannot go to amd-staging-drm-next (which is what the amd-gfx mailing list is for). You'll notice that kfd_peerdirect.c isn't even present in the amd-staging-drm-next branch. However, I can confirm that this patch was picked internally, and will be in the upcoming ROCm release