NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.22k stars 14.21k forks source link

Crash when a program uses openCL with AMD #305108

Open ilovethensa opened 7 months ago

ilovethensa commented 7 months ago

Describe the bug

All programs that try to use openCL crash

Steps To Reproduce

Steps to reproduce the behavior:

  1. use this config
    systemd.tmpfiles.rules = [
    "L+ /opt/rocm/hip - - - - ${pkgs.rocmPackages.clr}"
    ];
    # Enable ROCM on my RX 580
    environment = {
    variables = {
      ROC_ENABLE_PRE_VEGA = "1";
    };
    systemPackages = with pkgs; [
      clinfo
    ];
    };
    # Additional hardware configurations
    services.xserver.videoDrivers = ["amdgpu"];
    boot.initrd.kernelModules = ["amdgpu"];
    hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
    extraPackages = with pkgs; [
      rocm-opencl-icd
      rocm-opencl-runtime
    ];
    };
  2. Try to run hashcat

    Expected behavior

    It works

Screenshots

20240418_20h47m45s_grim

Additional context

My whole config is here output from clinfo

❯ clinfo
free(): invalid pointer

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.5, NixOS, 24.05 (Uakari), 24.05.20240412.cfd6b5f`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - nixpkgs: `/etc/nix/path/nixpkgs`

Add a :+1: reaction to issues you find important.

2b commented 5 months ago

It seems like there is an issue with the ROCM driver for Polaris cards. The problem can be reproduced with version 6.0.2, but not with 5.7.1.

You can test it as follows:

export OCL_ICD_VENDORS=$(nix-build '<nixpkgs>' --no-out-link -A rocmPackages_5.clr.icd)/etc/OpenCL/vendors/

nix-shell -p clinfo

clinfo

So workaround is to use ROCM v5 in your config:

hardware.opengl.extraPackages = with pkgs; [ rocmPackages_5.clr.icd ];
griffi-gh commented 1 month ago

i dont know if it's related but rocmPackages causes GPU hangs and crashes like on the pictures below every time any application tries to use HIP or OpenCL Using rocmPackages_5 instead of rocmPackages/rocmPackages_6 fixes this

20240924_212926_50 Screenshot_20240924_211642_Telegram

ReturnRei commented 1 month ago

Similar issue here, version 6 resulted in segfault. Version 5 "works" but anytime I run clinfo or hashcat -I I get screen flickering.

Gpu AMD Radeon RX 570 Series

$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.56, NixOS, 24.05 (Uakari), 24.05.5709.c0b1da36f7c3`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.8`
 - channels(root): `"nixos-24.05"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
{ config, lib, pkgs, ... }:

{
  services.xserver.enable = true;
  services.xserver.videoDrivers = ["amdgpu"];
#  boot.initrd.kernelModules = ["amdgpu"];
  hardware.opengl = {

  enable = true;
  driSupport = true;
  driSupport32Bit = true;
  extraPackages = with pkgs; [ 
    amdvlk
    rocmPackages_5.clr.icd
    rocmPackages_5.clr
    rocmPackages_5.rocm-runtime
  ];
};
  systemd.tmpfiles.rules = [
    "L+    /opt/rocm/hip   -    -    -     -    ${pkgs.rocmPackages_5.clr}"
  ];

  environment.variables = {
  ROC_ENABLE_PRE_VEGA = "1";
};
environment.systemPackages = with pkgs; [ 
  lact
  amdgpu_top
  rocmPackages_5.rocminfo
  vulkan-tools

];
systemd.packages = with pkgs; [ lact ];
systemd.services.lactd.wantedBy = ["multi-user.target"];

}

EDIT: This seems like a good enough quickfix if it might help someone. RustiCL seems promising


{ config, lib, pkgs, ... }:

let
  pkgsUnstable = import <unstable> { config = config.nixpkgs.config; };
in
{
  services.xserver.enable = true;
  services.xserver.videoDrivers = [ "amdgpu" ];
  boot.initrd.kernelModules = [ "amdgpu" ];

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
    extraPackages = with pkgs; [
      mesa.opencl
    ];
  };

  environment.variables = {
    RUSTICL_ENABLE = "radeonsi";
  };

  environment.systemPackages = with pkgs; [
    lact
    amdgpu_top
    vulkan-tools
  ];

  systemd.packages = with pkgs; [ lact ];
  systemd.services.lactd.wantedBy = [ "multi-user.target" ];
}