ROCm / ROCK-Kernel-Driver

AMDGPU Driver with KFD used by the ROCm project. Also contains the current Linux Kernel that matches this base driver
Other
320 stars 97 forks source link

Rock-kernel API for ktread options? #146

Open jpsollie opened 1 year ago

jpsollie commented 1 year ago

A few days ago, I gave up on hardware raid (due to having MANY issues with different controllers) and am preparing to go for linux software raid instead.

An idea I'm playing with, is to see whether I can develop a mod so the linux kernel uses the GPU when available (HSA / amdkfd) so the CPU isn't loaded by performing raid 6 calculations to verify integrity. It has other jobs to do.

I know openCL, but is there a way to use HSA /amdkfd in the linux kernel directly without an external llvm-based compiler? and if so, is there ANY documentation about that?

fxkamd commented 1 year ago

I don't know of any precedent of using OpenCL in kernel mode. It requires significant user mode SW that would not be practical to run in kernel mode. I won't even mention the potential security implications. If you want to use GPU computing in kernel mode, you'd have to do this at a much lower level, offline-compiled shader code, and a simple kernel-mode runtime to dispatch the compute jobs to the GPU, setup the parameters and memory mappings etc. Shader ISA varies between different GPU generations, so this would not be portable at all. Maybe you could load the shader code from a user mode helper. You'd also need some kind of kernel mode interface from RAID to the GPU driver. Finally, if the GPU ever crashes, you risk data corruption on your drives without some very careful error/timeout handling. Given that the point of RAID is improved reliability and data integrity, I would recommend against using GPUs, except maybe some recent data-center GPUs that support RAS (e.g. ECC for VRAM etc.).

jpsollie commented 1 year ago

I don't know of any precedent of using OpenCL in kernel mode. It requires significant user mode SW that would not be practical to run in kernel mode. I won't even mention the potential security implications. If you want to use GPU computing in kernel mode, you'd have to do this at a much lower level, offline-compiled shader code, and a simple kernel-mode runtime to dispatch the compute jobs to the GPU, setup the parameters and memory mappings etc. Shader ISA varies between different GPU generations, so this would not be portable at all. Maybe you could load the shader code from a user mode helper. You'd also need some kind of kernel mode interface from RAID to the GPU driver. Finally, if the GPU ever crashes, you risk data corruption on your drives without some very careful error/timeout handling. Given that the point of RAID is improved reliability and data integrity, I would recommend against using GPUs, except maybe some recent data-center GPUs that support RAS (e.g. ECC for VRAM etc.).

Thanks for your reply, I am aware of the risk of GPU offloading, though I'd wonder whether ECC memory would be a big deal here: I want the GPU to verify the correctness of the 2 ecc strips when performing a read operation, or recalculate missing data in case of a degraded array. The second is an atomic operation when you have a journal. But it looks pretty complex indeed. Do you think the 6.2 accel subsystem may be of any use?

ppanchad-amd commented 3 weeks ago

@jpsollie Do you still require assistance with this ticket? If not, please close the ticket. Thanks!