corngood / nixpkgs

Nix Packages collection
MIT License
3 stars 2 forks source link

Packaging the ROCK (4.11 + AMD patches) kernel #4

Open alexanderkjeldaas opened 6 years ago

alexanderkjeldaas commented 6 years ago

Issue description

The ROCm project needs the ROCK kernel for a good while longer. The current ROCm release is 1.6, and 1.7 is being released now. There will be one or more releases at least before the required functionality is upstreamed.

It seems like packaging the ROCK kernel is the right thing to do in the meantime.

https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver

The last changes were in August, and there are indications that it's being rebased on top of 4.13 based on https://github.com/RadeonOpenCompute/ROCm/issues/256 comments.

alexanderkjeldaas commented 6 years ago

The kernel should be called 4.11.0-kfd based on the naming convention the ubuntu packages use.

corngood commented 6 years ago

I had a look at the default kernel config in the ROCK kernel, and the only obvious relevant difference from ubuntu was DRM_AMD_DC=y, so I enabled that and build the kernel from their tree. If you want to try the kernel, you can cherry-pick 76cb3819c1b10139dad4d4f828a9544fb6ffb305 and set boot.kernelPackages = pkgs.linuxPackagesFor pkgs.linux_rock.

It boots without any obvious errors, and DC seems to be working.

I'll get into the userspace stuff next.

alexanderkjeldaas commented 6 years ago

Cool.. I also tested this yesterday. Didn't think you'd get around to it.. :-) https://github.com/NixOS/nixpkgs/pull/32376

alexanderkjeldaas commented 6 years ago

Userspace seems to be a little bit more involved..

alexanderkjeldaas commented 6 years ago

should be out this week. https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/amd-linux/988926-amd-rolls-out-rocm-1-7-platform-for-supercomputing-17/page5

corngood commented 6 years ago

@alexanderkjeldaas So you tested this kernel already on NixOS? Do you have any other WIP stuff?

alexanderkjeldaas commented 6 years ago

I get some display issues. And wifi doesn't work. So it's not 100%.

alexanderkjeldaas commented 6 years ago

I have some stuff in https://github.com/alexanderkjeldaas/nixpkgs/tree/ak/rocm-changes

alexanderkjeldaas commented 6 years ago

roct compiles now

corngood commented 6 years ago

Ok, we should definitely coordinate this, because you're already doing what I was going to do next.

Is there anything in particular you're stuck on, or some work that can be divided?

alexanderkjeldaas commented 6 years ago

let's create a list of stuff:

I'm working on hcc, just finished hsa-runtime-amd.

Could you try ROCm-OpenCL-Driver?

corngood commented 6 years ago

Sure, I'll continue with it. I will have limited time before the weekend though.

alexanderkjeldaas commented 6 years ago

I'm looking at https://github.com/RadeonOpenCompute/ROCm-Device-Libs now

alexanderkjeldaas commented 6 years ago

I'm also afraid that I don't have time to finish this.. :-)

alexanderkjeldaas commented 6 years ago

ROCm-Device-Libs done. I can't edit the ROCm board.

alexanderkjeldaas commented 6 years ago

Continuing with hcc now. Current issue is finding the right HSA headers for hcc. I tried hsa-runtime-amd but looks like rocr-runtime is the one to use. Still not found during some compilation steps.

alexanderkjeldaas commented 6 years ago

all in ak/rocm-changes in my tree.

alexanderkjeldaas commented 6 years ago

I'm working on ROCm-OpenCL-Runtime

alexanderkjeldaas commented 6 years ago

ROCm-OpenCL-Runtime is done.

corngood commented 6 years ago

I'll have some free time over the next few days, so I was going to have a look at your changes. I notice you have amdgpu-pro changes in there. Are you using any of the pro stack when testing ROCm?

Have you got to the point where the CL runtime is actually working?

alexanderkjeldaas commented 6 years ago

The CL runtime is "working" in that clinfo ++ work, things link etc. But I can't find my card yet.

On Tue, Dec 12, 2017 at 10:51 PM, David McFarland notifications@github.com wrote:

I'll have some free time over the next few days, so I was going to have a look at your changes. I notice you have amdgpu-pro changes in there. Are you using any of the pro stack when testing ROCm?

Have you got to the point where the CL runtime is actually working?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corngood/nixpkgs/issues/4#issuecomment-351206559, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUtqSx4CA8wf2AnOONc5qwP353yWe-Nks5s_vVbgaJpZM4Q3COY .

alexanderkjeldaas commented 6 years ago

I'm not using amdgpu-pro, or I'm testing it independently.

On Wed, Dec 13, 2017 at 12:27 AM, Alexander Kjeldaas ak@formalprivacy.com wrote:

The CL runtime is "working" in that clinfo ++ work, things link etc. But I can't find my card yet.

On Tue, Dec 12, 2017 at 10:51 PM, David McFarland < notifications@github.com> wrote:

I'll have some free time over the next few days, so I was going to have a look at your changes. I notice you have amdgpu-pro changes in there. Are you using any of the pro stack when testing ROCm?

Have you got to the point where the CL runtime is actually working?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corngood/nixpkgs/issues/4#issuecomment-351206559, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUtqSx4CA8wf2AnOONc5qwP353yWe-Nks5s_vVbgaJpZM4Q3COY .

alexanderkjeldaas commented 6 years ago

I've just packaged the ROC-smi utility. It returns nothing on my system. Same with clinfo.

If you could figure out how the kernel and those utils need to work together, then I'd be happy.

On Wed, Dec 13, 2017 at 12:27 AM, Alexander Kjeldaas ak@formalprivacy.com wrote:

I'm not using amdgpu-pro, or I'm testing it independently.

On Wed, Dec 13, 2017 at 12:27 AM, Alexander Kjeldaas <ak@formalprivacy.com

wrote:

The CL runtime is "working" in that clinfo ++ work, things link etc. But I can't find my card yet.

On Tue, Dec 12, 2017 at 10:51 PM, David McFarland < notifications@github.com> wrote:

I'll have some free time over the next few days, so I was going to have a look at your changes. I notice you have amdgpu-pro changes in there. Are you using any of the pro stack when testing ROCm?

Have you got to the point where the CL runtime is actually working?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corngood/nixpkgs/issues/4#issuecomment-351206559, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUtqSx4CA8wf2AnOONc5qwP353yWe-Nks5s_vVbgaJpZM4Q3COY .

corngood commented 6 years ago

Like you get literally nothing? I'm getting:

➜  nixpkgs git:(ak/rocm-changes) sudo $(nix-build -A roc-smi)/bin/rocm-smi

====================    ROCm System Management Interface    ====================
================================================================================
 GPU  DID    Temp     AvgPwr   SCLK     MCLK     Fan      Perf    OverDrive  ECC
  0   67b1   45.0c    N/A      483Mhz   1300Mhz  24.71%   auto      0%       N/A
================================================================================
====================           End of ROCm SMI Log          ====================

With an R9 290 using the kernel from my branch.

corngood commented 6 years ago

What's your /sys/module/amdgpu/parameters/dc? Mine is -1. Also, I didn't update any firmwares from master.

alexanderkjeldaas commented 6 years ago

I think my problem is that I'm booting with nomodeset, and then amdgpu refuses to load.

alexanderkjeldaas commented 6 years ago

I'm unable to boot without nomodeset.

alexanderkjeldaas commented 6 years ago

I'm going to try a few other motherboards and see what i get.

corngood commented 6 years ago

I've only been testing in xorg. Do you need to run without it?

alexanderkjeldaas commented 6 years ago

rocm-smi now works with kernel 4.15.0-rc3

I've rebased my branch with nixpkgs head and added a few updates.

corngood commented 6 years ago

That's good. Is opencl working? Anything I can help with?

alexanderkjeldaas commented 6 years ago

No actually I was wrong /dev/kfd doesn't work with that kernel rocm-smi does work, but clinfo doesn't.

On Thu, Dec 14, 2017 at 9:33 PM, David McFarland notifications@github.com wrote:

That's good. Is opencl working? Anything I can help with?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/corngood/nixpkgs/issues/4#issuecomment-351828057, or mute the thread https://github.com/notifications/unsubscribe-auth/AAUtqd3XUFUfYU54Mvm5PYFpMyF2CSLvks5tAYYlgaJpZM4Q3COY .