NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
18.11k stars 14.15k forks source link

R built with MKL computes incorrect results #104026

Closed alexvorobiev closed 1 year ago

alexvorobiev commented 3 years ago

Describe the bug I built R using MKL for both BLAS and LAPACK based on this new approach https://github.com/NixOS/nixpkgs/pull/83888. R works but computes incorrect results for matrix operations (tested on matrix multiplication).

To Reproduce Steps to reproduce the behavior:

let                                                                                                                                                                              
  pkgs = (import (builtins.fetchGit (import ./pinned-nixpkgs.nix)) ) {                                                                                                           
    config = {                                                                                                                                                                   
      allowUnfree = true;                                                                                                                                                        
      doCheckByDefault = false;                                                                                                                                                  
    };                                                                                                                                                                           

    overlays = [                                                                                                                                                                 
      (self: super: {                                                                                                                                                        
         blas = super.blas.override {                                                                                                                                            
           blasProvider = super.mkl;                                                                                                                                             
         };                                                                                                                                                                      
         lapack = super.lapack.override {                                                                                                                                        
           lapackProvider = super.mkl;                                                                                                                                           
         };                                                                                                                                                                      
      })                                                                                                                                                                         

      (self: super: rec {                                                                                                                                                        
         R = (                                                                                                                                                                   
           super.R.overrideAttrs (oldAttrs: rec {                                                                                                                                                                                                                                                                                                              
             doCheck = false;                                                                                                                                                    
             doInstallCheck = false;                                                                                                                                             
          })                                                                                                                                                                     
        );                                                                                                                                                                       
      })];                                                                                                                                                                       
  };                                                                                                                                                                             

in with pkgs; rec {                                                                                                                                                                                                                                                                                                                                           
    rEnv = rWrapper.override {                                                                                                                                                   
      packages = with rPackages; [                                                                                                                                               
        # to get/set the number of threads                                                                                                                                       
        mRMRe ];                                                                                                                                                                 
    };                                                                                                                                                                           
}

where pinned-nixpkgs.nix is:

{                                                                                                                                                                                
  name = "nixpkgs-unstable-2020-11-16";                                                                                                                                          
  url = "https://github.com/nixos/nixpkgs";                                                                                                                                      
  ref = "refs/heads/nixpkgs-unstable";                                                                                                                                           
  rev = "dd1b7e377f6d77ddee4ab84be11173d3566d6a18";                                                                                                                              
}          

Note that I had to turn the tests off (doCheck, doInstallCheck). With tests on the build freezes on testing the examples for stats package.

Here is the test. I ran the same matrix multiplication 1000 times and counted all the different results. I never had results greater than 30.

> res <- rep(0, 30)
> mat <- matrix(data = rep(1,30000), nrow = 10000, ncol = 3)
> vect <- c(1,-1,3)
> for (i in 1:1000) { r <- (mat %*% vect)[1]; res[[r]] <- res[[r]] + 1 }
> res
 [1]   0   0  26   0   0  51   0   0  82   0   0 143   0   0  96   0   0 114   0
[20]   0 146   0   0 342   0   0   0   0   0   0

The result is different every time. It has to be 3. This time only 26 out of 1000 were correct.

Expected behavior Here is the same code running in Microsoft R Open which also uses MKL:

> for (i in 1:1000) { r <- (mat %*% vect)[1]; res[[r]] <- res[[r]] + 1 }
> res
 [1]    0    0 1000    0    0    0    0    0    0    0    0    0    0    0    0
[16]    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0

Additional context I got the same correct result with blas/lapack set to either lapack-reference or openblasCompat. I wonder if other packages built with MKL have the same issue.

Notify maintainers (based on last commits to R) @matthewbauer @r-ryantm @matthiasbeyer @jabranham

Metadata

Maintainer information:

attribute:
  - R
  - mkl
# a list of nixos modules affected by the problem
module:
nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/improving-nixos-data-science-infrastructure-ci-for-mkl-cuda/5074/50

alexvorobiev commented 3 years ago

I tested numpy built using the same overlay and got correct results, so this seems to be specific to R.

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/understand-nixpkgs-structure-semantic/10813/10

alexvorobiev commented 3 years ago

The current workaround I came up with is to disable the pluggable BLAS and add the proper link options recommended by Intel to R derivation. See https://github.com/NixOS/nixpkgs/pull/85636#issuecomment-765754129.

nixos-discourse commented 3 years ago

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/having-lots-of-trouble-with-r-development/12645/4

PhDyellow commented 3 years ago

I'm looking into this, but it isn't building for me. It says configure: error: unrecognized option: '-Wl,--no-as-needed'. Are you using R 4.1.1 yet? Can you share your full R default.nix?

alexvorobiev commented 3 years ago

Here it is for 4.1.0 https://gist.github.com/alexvorobiev/3f215d306cae7ed05c22f5c12dbc401c. I haven't switched to 4.1.1 yet.

jbedo commented 3 years ago

I tried on latest master and MKL produced the correct result.

stale[bot] commented 2 years ago

I marked this as stale due to inactivity. → More info

alexvorobiev commented 2 years ago

I tried on the latest master (R-4.2.1 with MKL) and got the correct result. I think this may be closed but I would be curious to know the explanation for the fixed behavior - was it new R, new MKL, or some changes in how nixpgks generalizes BLAS/LAPACK to use MKL?

jbedo commented 2 years ago

You could use git bisect to find the change that fixed it.