DavHau / mach-nix

Create highly reproducible python environments
MIT License
858 stars 105 forks source link

R package reproducibility #313

Closed TyberiusPrime closed 2 years ago

TyberiusPrime commented 3 years ago

I can add R packages and rpy, and R to my mach-nix derivation. Very cool. Doesn't quite work (import rpy2.robjects says missing R_HOME), but that's gotta be somewhere in the way I invoke mach-nix.

But how do I lock those versions down?

In my previous life I found that if you're not using a specific bioconductor release (and the resulting R version), and a specific date snapshot of CRAN (which microsoft provides) you are going to go insane.

Any pointers on how to achieve this would be appreciated!

TyberiusPrime commented 3 years ago

packagesExtra = [pkgs.rWrapper] ++ (with mach-nix.rPackages; [edgeR]); solves the R_HOME problem.

Now, how do I get an R with the same packages?

Answer

let my_r_packages = with pkgs.rPackages; [ edgeR ];
     my_r = pkgs.rWrapper.override { packages = my_r_packages; };
in mach-nix.mkPython ... packageExtras = [my_r] ++ my_r_packages;...
DavHau commented 3 years ago
let my_r_packages = with pkgs.rPackages; [ edgeR ];
     my_r = pkgs.rWrapper.override { packages = my_r_packages; };
in mach-nix.mkPython ... packageExtras = [my_r] ++ my_r_packages;...

Do you really need the ++ my_r_packages despite the packages already being included in my_r?

I do not really use R packages myself. If you could help in contributing an improvement in the way mach-nix assembles R environments, that would be great.

Currently what mach-nix does to include R packages is to add them to buildInputs of the rpy2 package. See these lines https://github.com/DavHau/mach-nix/blob/d223656fc0eff33f4da77d69db19752edc9a5ba5/mach_nix/nix/mkPython.nix#L83-L85

As I understand from your issue, this is not enough to make things work. @InLaw, do you have any opinion on this?

TyberiusPrime commented 3 years ago

yes, I do need them, because my_r does not declare an rCommand, and the the mach-nix 'is there any r packages about' detection fails.

I envy you. I had a deep dive into R packing this afternoon (once again, into the breach...), and I found that while it's at least possible to get CRAN into a reproducible shape, thanks to daily mirroring by microsoft, bioconductor is a whole different beast.

The bioconductor authors replace their PACKAGES description file inside one release to do update the z in x.y.z. package versions. And they throw away the old package.x.y.z.tar.gz in favor of package.x.y.z+1.tar.gz

Rstudio has it's own mirror, it actually keeps the old package.x.y.z.tar.gz in an archive, but it also always has the newest PACKAGES, and I can't find the archived versions. The interactive/api websystem they build knows about them though.

And don't get me started on Bioconductor Release X, on date Y, after such a point update referencing CRAN packages and versions that did not exist on date Y.

jbedo commented 3 years ago

Currently what mach-nix does to include R packages is to add them to buildInputs of the rpy2 package. See these lines

https://github.com/DavHau/mach-nix/blob/d223656fc0eff33f4da77d69db19752edc9a5ba5/mach_nix/nix/mkPython.nix#L83-L85

As I understand from your issue, this is not enough to make things work.

They have to go into propagatedBuildInputs.

TyberiusPrime commented 2 years ago

I've solved all my issues in my anysnake2 project, thanks for y'all's feedback!