everpub / openscienceprize

:telescope: Everpub - Making reusability a first class citizen in the scientific workflow.
Other
70 stars 20 forks source link

What about adding also conda environments to the mix #8

Open tritemio opened 8 years ago

tritemio commented 8 years ago

Docker containers are great, but oftentimes the software is simple enough that the environment can be reproduced with conda. This approach has also the benefit to be multi-platform.

It is a slightly more fragile approach but I think is good enough in many cases and a big improvement from manual installation.

In the spirit of "not changing you tools" it makes sense to add conda to the mix.

betatim commented 8 years ago

Good point. Also thinking about vagrant.

Could you elaborate on multi-platform? For me docker solves this very nicely, in the sense that it works on linux, windows and mac (the latter two need virtualbox but fine).

I don't use conda envs enough for sharing environments but someone once told me that conda env export > environment.yml would not produce something that can be easily shared across platforms. The only data point I have to contribute is that several third-party conda recipes aren't perfect and break on platforms that aren't linux.

My favourite approach is to use conda inside docker. I have a shell script which does the equivalent of source activate this-projects-env but drops me into the docker container. This is one of these "trivial" tools that should be part of what this project produces and then publicises.

tritemio commented 8 years ago

In general you can create conda environment that are platform specific if you depends on packages that are available only on one platform. Otherwise they should work across OS X, Windows and Linux.

As a non completely trivial (but simple) example, I created a demo on mybinder using only a conda environment, and the environment was not created on linux:

https://github.com/Photon-HDF5/Photon-HDF5-Converter

I currently don't run virtual machines but work on Windows, OS X and Linux using only conda. A lightweight approach that works if you use mostly standard and homebuild packages.

tritemio commented 8 years ago

Found this slide (and the next 2 going down) on using snakemake with conda:

http://slides.com/johanneskoester/snakemake-broad-2015#/23/3

ctb commented 8 years ago

-1 on specifically saying we'll support conda in the prototype; +1 on saying that we're open to including it.

betatim commented 8 years ago

From talking to people it seems the pragmatists say: conda is nice but (often) isn't enough so you need something like docker (mainly related to lots of software not having conda packages but you can install them on a linux OS pretty easily). So I would vote for us saying "docker is the starting point, inside it you (the scientists) can do as you please"

tritemio commented 8 years ago

On Sat, Feb 27, 2016 at 2:36 PM, Tim Head notifications@github.com wrote:

From talking to people it seems the pragmatists say: conda is nice but (often) isn't enough so you need something like docker (mainly related to lots of software not having conda packages but you can install them on a linux OS pretty easily). So I would vote for us saying "docker is the starting point, inside it you (the scientists) can do as you please"

I use conda environments on 3 platforms for python code+cython extensions and did not encounter any fundamental problem. The only issue I had with Anaconda is the occasional (temporary) breakage of packages so that you have to revert to an old version. But this does not affect environments: if it worked once it will keep working.

If you depend on C/C++ libraries not included in Anaconda then yes, there are better tools. But for the majority of researchers python + R are enough. Aren't those peoples the main target of this proposal?

I think docker is great but I fear that making it absolutely necessary will complicate the simple workflow. You need to setup virtual machines at the very minimum unless you run linux which is unlikely for entry-level users.

For the current proposal I don't think this detail is an issue but, in general, I would rather have the option to setup a "paper" using simple conda environments.

betatim commented 8 years ago

On Sun, Feb 28, 2016 at 1:40 AM Antonino Ingargiola < notifications@github.com> wrote:

On Sat, Feb 27, 2016 at 2:36 PM, Tim Head notifications@github.com wrote:

From talking to people it seems the pragmatists say: conda is nice but (often) isn't enough so you need something like docker (mainly related to lots of software not having conda packages but you can install them on a linux OS pretty easily). So I would vote for us saying "docker is the starting point, inside it you (the scientists) can do as you please"

I use conda environments on 3 platforms for python code+cython extensions and did not encounter any fundamental problem. The only issue I had with Anaconda is the occasional (temporary) breakage of packages so that you have to revert to an old version. But this does not affect environments: if it worked once it will keep working.

If you depend on C/C++ libraries not included in Anaconda then yes, there are better tools. But for the majority of researchers python + R are enough. Aren't those peoples the main target of this proposal?

In my biased world view (coming from particle physics) nothing goes without supporting large and/or custom C++ environments. A lot of people (for good and bad reasons) write large parts of their paper pipeline in C++ and to access our data stored in a custom binary format you need ROOT. So given we will have to spawn docker containers for people anyway from an ops point of view I'd suggest we allow people to modify those as well.

I think docker is great but I fear that making it absolutely necessary will complicate the simple workflow. You need to setup virtual machines at the very minimum unless you run linux which is unlikely for entry-level users.

I fully agree that conda create -n nobelprize python is easier than getting started with and executing things inside a container. This is why I think we need to build some commandline tools/UI so that it becomes as easy.

ctb commented 8 years ago

I think docker is great but I fear that making it absolutely necessary will complicate the simple workflow. You need to setup virtual machines at the very minimum unless you run linux which is unlikely for entry-level users.

I fully agree that conda create -n nobelprize python is easier than getting started with and executing things inside a container. This is why I think we need to build some commandline tools/UI so that it becomes as easy.

Absolutely - I still expect to see many people using their own paper environment and I would like to give them the tools to encapsulate their paper execution and rendering environment in such a way that it can be run in a clean environment. Docker can be our demo and cloud environment but it should not be the only way you can run things!

khinsen commented 8 years ago

+1

We need to start with something concrete, which is Docker containers. But we can add other options later. In fact, we have, if we want this to last for longer than the latest fad in computing technology.