ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
215 stars 126 forks source link

Migrating installing deps from conda-forge to PyPi #1107

Closed valeriupredoi closed 5 years ago

valeriupredoi commented 5 years ago

We should, as much as possible, migrate as many deps from conda-forge to PyPi (since PyPi is more stable and robust). I started off with an old gripe of mine, iris. Proves out installing iris>=2.2 from PyPi in a conda virtual environment is much harder than I thought. I got it working in a test base environment with this recipe:

---
name: test_basic
channels:
  - conda-forge

dependencies:
  #- matplotlib<3
  - pip=18   # revert from 19 due to PEP517 error with cartopy
  - six      # will not install automatically via pip
  - proj4    # will not install automatically via pip
  - pyke     # will not install automatically via pip
  - cartopy  # pip install: error: command 'gcc' failed with exit status 1 
  - udunits2 # libudunits2.so.0: cannot open shared object
  - pip:
    - scitools-iris>=2.2

but it's not elegant. I opened an issue https://github.com/SciTools/iris/issues/3321 see what the iris folk have to say about it (I might be completely wrong and there is a much easier way to do it, who knows). @bjlittle or @bouweandela or @zklaus pls chip in!

Also there are other deps that can be installed from PyPi and are from conda at the mo, we should think about those too. :beer:

zklaus commented 5 years ago

Hm, I have a bit of a different take on the premise: It is generally difficult to install projects that contain binary code or have dependencies with binary code from PyPi. This is because in that case we rely on a build environment and things become brittle fast, more when users use modified LD_LIBRARY_PATH environment variables, as is often the case on super computers with and without the modules system.

In many ways this seems to be the raison d'être for conda since most of what it does could be achieved with virtualenvs for pure python packages.

Imho it is fine to have packages that are heavy on binary dependencies installed from conda and I would not suggest to move the iris installation. If there really is a problem with stability (maybe there is something that I missed?) we might rather consider having our own channel. But really, despite the recent hickup with changed compilers, I think conda-forge is fine. Keep in mind that PyPi was never at risk of a similar problem simply because they rely from the get-go on the user to provide a build environment.

Of course, there is a place for PyPi installs and I would encourage every project including our own to provide PyPi packages, but for us that is more in the realm of expert users (Maybe you want a netcdf library build with your vendor mpi library?) which should be able to figure this out on their own or ask us for support. This way, mere mortals will still be able to install ESMValTool with relative ease.

tl;dr: Let's keep iris and similarly binary heavy packages as conda dependencies.

valeriupredoi commented 5 years ago

the reverse problem with picking up deps from conda is its brittle nature of its dependency tree hence solving the environment failures, hence the need for pinning/unpinning/adding deps in the env file etc - this is something that @bouweandela and myself had to do quite a few times recently; PyPi offers stability in this respect and the more deps from PyPi the less from conda hence less possible problems wrt the complex dependency tree and conda's inability to solve the environment :beer:

zklaus commented 5 years ago

Just to be clear: I absolutely agree that it's nice to get things from PyPi. But for binary dependencies this turns out to be difficult, hence I would get these from conda by default.

What do you mean by

brittle nature of its dependency tree

? The dependency tree is the same, no matter the package manager, no?

valeriupredoi commented 5 years ago

yes the dependency tree is the same in terms of package names (and probably broad versions) but conda uses a lot of identifiers for for establishing the conda-correct dependency tree shape: name, version, build, channel and these clash more often than for other package managers and conda fails to solve an environment that can probably be more relaxed and solved by other managers

zklaus commented 5 years ago

Well, as I said above, conda keeps track of the build environment (compiler, standard libraries, ...) and that means it has to have more comprehensive identifiers. But that isn't really any easier with pip---it is just ignored and burdened on the user. Of course, that is a completely valid approach but I think this is for experts and if we choose to go down this road, we should be conscious that things are not simpler with PyPi because it is better at dependency resolution or similar things, but because it doesn't attempt to solve this problem, instead leaving it to the user.

valeriupredoi commented 5 years ago

Well, as I said above, conda keeps track of the build environment (compiler, standard libraries, ...) and that means it has to have more comprehensive identifiers. But that isn't really any easier with pip---it is just ignored and burdened on the user. Of course, that is a completely valid approach but I think this is for experts and if we choose to go down this road, we should be conscious that things are not simpler with PyPi because it is better at dependency resolution or similar things, but because it doesn't attempt to solve this problem, instead leaving it to the user.

yes! solid point, man. But pip-ing surely makes our lives (as core devs) easier :grin: Anyways, what are others thinking of this: B-man, Javi, Bill?

valeriupredoi commented 5 years ago

In any case - I finally have an env file that now installs iris from pypi (working environment, the tool installs and runs fine): iris, matplotlib (version controlled by iris) and xarray from PyPi:

---
name: esmvaltool_pypi
channels:
  - conda-forge

dependencies:
  # Python packages that cannot be installed from PyPI:
  - esmpy
  - python-stratify
  # Non-Python dependencies
  - graphviz
  - cdo
  - imagemagick
  - nco

  # Multi language support:
  - python>=3.6
  - libunwind  # Needed for Python3.7+
  - ncl>=6.5.0
  - r-base
  - r-curl  # Dependency of lintr, but fails to compile because it cannot find libcurl installed from conda.
  - r-udunits2  # Fails to compile because it cannot find udunits2 installed from conda.
  # - julia>=1.0.0  # The Julia package on conda is apparently broken

  # for iris
  - pip=18   # revert from 19 due to PEP517 error with cartopy
  - proj4
  - pyke     # will not install automatically via pip
  - cartopy  # pip install: error: command 'gcc' failed with exit status 1 
  - udunits2 # libudunits2.so.0: cannot open shared object
  - pip:
    - six      # will not install automatically via pip
    - scitools-iris>=2.2
    - xarray>=0.12.0
valeriupredoi commented 5 years ago

adding anaconda as option channel helps a lot as @bjlittle pointed out. This env file:

---
name: esmvaltool_pypi
channels:
  - anaconda
  - conda-forge

dependencies:
  # Python packages that cannot be installed from PyPI:
  - esmpy
  - python-stratify
  # Non-Python dependencies
  - graphviz
  - cdo
  - imagemagick
  - nco

  # Multi language support:
  - python>=3.6
  - libunwind  # Needed for Python3.7+
  - ncl>=6.5.0
  - r-base
  - r-curl  # Dependency of lintr, but fails to compile because it cannot find libcurl installed from conda.
  - r-udunits2  # Fails to compile because it cannot find udunits2 installed from conda.
  # for iris
  - pip
  - pyke     
  - cartopy  
  - udunits2 
  - pip:
    - scitools-iris>=2.2
    - xarray>=0.12.0

reduces the number of packages installed from conda-forge from 170 with our current setup to 67, with a lot more now straight off PyPi :beer:

bjlittle commented 5 years ago

@valeriupredoi What more do you need to do to close this issue?

We could certainly make like slightly easier by removing the dependency of iris on pyke. Personally, I'd like to aim to do this in the next minor release of iris 2.3.0 - which I'd love to schedule sometime in the next quarter.

valeriupredoi commented 5 years ago

so far I think we're in good shape but the environment.yml has not changed yet on the main branch so let's keep this open until we change the env :beer:

valeriupredoi commented 5 years ago

closing this since I've re-tried the solution in here and it bloody don't work no more, courtesy to our beloved leader conda :angry: