conda-forge / staged-recipes

A place to submit conda recipes before they become fully fledged conda-forge feedstocks
https://conda-forge.org
BSD 3-Clause "New" or "Revised" License
710 stars 4.96k forks source link

Porting R channel to conda-forge #2009

Closed johanneskoester closed 7 years ago

johanneskoester commented 7 years ago

This issue will track the effort to port the R channel into conda-forge. Please feel free to join if you like.

How to contribute

We can now start migrating R packages from conda/conda-recipes.

Prerequisites

R packages to migrate (topological sort levels such that all r-package depencencies are satisfied)

Level 1

Cleanup script:

In order to clean up recipes created with conda skeleton cran, I apply the following script:

#!/bin/sh

RECIPE=$1/meta.yaml
BUILDSH=$1/build.sh
BUILDBAT=$1/bld.bat

sed -i -e '/^ *#.*$/d' $RECIPE
sed -i -e '/^$/N;/^\n$/D' $RECIPE

echo extra: >> $RECIPE
echo "  recipe-maintainers:" >> $RECIPE
echo "    - johanneskoester" >> $RECIPE
echo "    - bgruening" >> $RECIPE

echo '"%R%" CMD INSTALL --build .' > $BUILDBAT
echo 'if errorlevel 1 exit 1' >> $BUILDBAT

echo '#!/bin/bash' > $BUILDSH
echo '' >> $BUILDSH
echo '$R CMD INSTALL --build .' >> $BUILDSH
johanneskoester commented 7 years ago

Thanks, @mjsteinbaugh

jakirkham commented 7 years ago

So what if we want to pin to CONDA_R=3.4 instead of CONDA_R=3.4.0?

When are you thinking of doing this?

johanneskoester commented 7 years ago

Within the next weeks. Unfortunately R 3.4.0 has been recently released. Now, it does not make too much sense to update everything in bioconda against 3.3.2 in conda-forge. Instead, we think it is best to build against 3.4 directly. However, one can already see that pinning to patch versions does not scale in the long run (both in terms of storage and build time). We have evaluated though that it should be fine to pin to the minor version (aka 3.4, not 3.4.0). R will print warnings if a package has been built with a newer version. But in general this is considered to work. If individual packages need special treatment, those feedstocks can always use a more specific CONDA_R.

jakirkham commented 7 years ago

TBH I'm not sure to what extent conda-smithy/conda-build-all have control over this. If conda-build just uses the value of CONDA_R verbatim, then we can probably add some short hack with a simple test to conda-smithy for the interim. If you want to give this a try, then I'll find some time to take a look. In the long term we should discuss a way to do this with conda-build 3. If it turns out conda-build is doing something more sophisticated, then we might need to just move everything to conda-build 3 first.

Part of the reason I say this is conda-build 3 is going to change a lot of stuff when it comes to pinning and doing matrix builds. Please take a look at PR ( https://github.com/conda/conda-docs/pull/414 ) for details if you haven't seen it yet. This will impact things like how we handle pinnings, matrix builds, and may even result in us dropping use of conda-build-all.

asmeurer commented 7 years ago

Curious what is your strategy to keep these feedstocks up-to-date? Is there a rerender-like script that can update them automatically?

ocefpaf commented 7 years ago

Curious what is your strategy to keep these feedstocks up-to-date? Is there a rerender-like script that can update them automatically?

There is not such a thing yet. I do know that @mingwandroid triggers updates with conda-skeleton, is that correct @mingwandroid?

I had a GSoC student with a nice proposal to create a conda-smithy update, but he did not make NumFOCUS cut. Luckily he is still interested in implementing that regardless of GSoC. More soon... But that is not a simple job. So not too soon :stuck_out_tongue_winking_eye:

asmeurer commented 7 years ago

Assumedly this wouldn't be too difficult to do reusing the existing rerender tooling.

mingwandroid commented 7 years ago

An old msys2 contributor had a tool to look for version updates that would modify the PKGBUILDS.

I find the dependencies change too a bit and there's some awfulness regarding build numbers for some old R packages in defaults that no longer see new versions that you need to be very careful about too (this was to do with adding license fields). In some cases those packages may not be getting used anyone and there's an argument for dropping them.

Currently the process is automatic to recreate them all with conda skeleton but then each one is manually compared (using beyond compare).

I'd love to see something in place to figure out the most popular packages on CRAN and make recipes for them too.

asmeurer commented 7 years ago

Something that I had thought about doing back when I was maintaining the R packages, but never got around to, was somehow making it so that the base recipe was completely auto-generated from conda skeleton, and any modifications on top of that recipe (most commonly, additional build steps for Windows) would be in separate files. With this workflow, you could regenerate the recipes with conda skeleton, and keep the modifications on top of that separate (and, obviously, update skeleton itself as much as possible to keep what can be automated automated). This sort of thing might require some modifications to conda-build. I don't know if it's possible to merge two files to make a meta.yaml.

mingwandroid commented 7 years ago

Since I added the msys2 infrastructure to the Ananconda Distribution there's no longer anything custom about Windows, but the idea of isolating any custom stuff that there is to something like a normal patch file is interesting.

.. then again, as a fan of git, I'd propose having those custom modifications in a specially named branch, then you'd just rebase that on top after each update and fix any conflicts. Maybe conda skeleton could help with that too though?

Merging two files to make a meta.yaml doesn't address build.sh / bld.bat.

asmeurer commented 7 years ago

Since I added the msys2 infrastructure to the Ananconda Distribution there's no longer anything custom about Windows, but the idea of isolating any custom stuff that there is to something like a normal patch file is interesting.

Maybe R itself doesn't, but certain packages will. I remember some packages, especially those with C extensions, require "manual" steps to build on Windows. I am curious, though, which packages, if any, did require additional modification beyond skeleton.

.. then again, as a fan of git, I'd propose having those custom modifications in a specially named branch, then you'd just rebase that on top after each update and fix any conflicts. Maybe conda skeleton could help with that too though?

I don't like the idea of using git here. All the data to reproduce and build a recipe should be on the main git branch.

Merging two files to make a meta.yaml doesn't address build.sh / bld.bat.

Merging build.sh/bld.bat is easy, as those are scripting languages that can call out to other files. The skeleton generated build.sh could just have a check for "custom_build.sh" in the recipe and run that, and if it doesn't exist run the default build script.

mingwandroid commented 7 years ago

Maybe R itself doesn't, but certain packages will. I remember some packages, especially those with C extensions, require "manual" steps to build on Windows. I am curious, though, which packages, if any, did require additional modification beyond skeleton.

Adding packages that require compilation requires a bit of work for all platforms but it's no worse and no more hacky on Windows than macOS or Linux.

johanneskoester commented 7 years ago

@jakirkham yes, as far as I know conda-build takes the value of CONDA_R verbatim. So if it is 3.4, it will just pin to 3.4 and all the patch versions should work. I would have thought that there is some way to tell conda-smithy: "Update the matrix of all feedstocks with CONDA_R to this value". Is that not the case? If not, how do you update Python?

jakirkham commented 7 years ago

Good to know. Adding a new version isn't an issue that will happen once a new r-base is released and feedstocks are re-rendered. The issue is that CONDA_R has been pinned to the minor version and if we aren't going to do that anymore, we need to change the logic in conda-smithy and/or conda-build-all. Does that make sense?

johanneskoester commented 7 years ago

Yes, makes sense. So basically it should simply determine the latest r-base x.y.z and set CONDA_R to x.y.