CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
300 stars 118 forks source link

Dependency Management: Fixing the (Conda?) Environment #109

Closed zedrdave closed 3 years ago

zedrdave commented 3 years ago

From my (limited) understanding, the current codebase is tied to geopandas 0.6.1, which is slowly approaching obsolescence with other downstream packages (eg PROJ 6+ changing the way CRS are handled). In the meantime, it does generate a lot of futurewarning

From a cursory look at the code, it seems like it might be possible to make the code compatible with some superficial changes to CRS-related code. Are there other known issues holding this bump up back?

I'll be happy to take a stab at a PR, if that seems a good idea.

mmyrte commented 3 years ago

Hi Dave, you're absolutely right to want this; I did, too. As a hackish solution for the moment, you might have a look at my fork's develop branch, which has been upgraded to be "bleeding edge" two months ago.

Keep in mind that conda will never be as progressive as PyPI, because there are often not only minimum version requirements, but also maximum constraints. This arguably makes package ecosystems more stable, at the cost of being a complete pain to install. As @emanuel-schmid mentioned on another issue, system dependencies such as GEOS are easier to install via conda for most of our users.
While we generally agreed to stick with conda, you could use brew for the system deps. I personally have sufficient faith in the unit and integration tests that I would use the resulting installation if the tests ran smoothly.

If you don't mind, I would like to hijack/rename this thread to organise a general upgrade - since forcibly upgrading only geopandas to >0.8 is very likely to break a lot of stuff.

zedrdave commented 3 years ago

@mmyrte Thanks for enlightening me on possible reasons to be cautious: as it happens, I also posted in a thread about reliance on Conda

Given Climada's scope and requirements, it might make more sense to use a sophisticated package manager like pipenv or poetry (most of the same benefits as Conda, without the propensity to wreak havoc on other virtual env). But I know how tiresome it is to have to continuously chase the latest fad in package managers, and could understand the reluctance to engage in yet another migration.

for the standard reasons, I am not able/willing to install Conda on prod machines, but have managed to get Climada running fine with a manually created virtual env (pyenv+pipenv) that currently reproduces the exact Conda version combo. Taking care of external dependencies is indeed easier if you have a tool like apt or homebrew (not so much on Windows… so I can understand why Conda would be seen as the best compromise).

I wasn't aware that the version being held back was a consequence of Conda's version availability… That being said, in the time it would take to produce a fully working and tested upgrade, a satisfying version combo should certainly be available through Conda too?

If you don't mind, I would like to hijack/rename this thread to organise a general upgrade - since forcibly upgrading only geopandas to >0.8 is very likely to break a lot of stuff.

By all means. And as I've said before: I'd be happy to contribute in any way that makes sense. My knowledge of Climada is still pretty limited, but I have fairly solid experience with large Python projects (I'm trying to skirt the line between helpful and annoyingly clueless newly-arrived-on-a-project).

mmyrte commented 3 years ago

I'm not at all calling the shots on this project, I'm just a master's student - if I were, however, I'm pretty sure that it would be very valuable to overthink our reliance on conda. Iff there is an alternative that makes it trivial for beginners to install the system deps, then that would be great. (I've just run into problems again with conda; it claims that Python packages are incompatible with Python 3.5 through 3.9, which is patently absurd.)

I think it's worth giving you the background of our target audience:

Maybe there is a way to maintain several environment files to accomodate all use cases. At least for us who regularly work with climada, it's become obvious that there are many problems with conda. I'd rather rely on @emanuel-schmid to decide whether that's down to our complex dependencies, or due to conda.

zedrdave commented 3 years ago

@mmyrte Agreed on the need to cover a number of different audiences, with different threshold for installation effort vs maintenance vs other technical concerns.

There are 2 services generally provided by standard packaging tools:

  1. handling sub-dependencies automatically
  2. guaranteeing perfect reproducibility of a given dependency tree snapshot (.lock files)

PyPi/pip does provide 1, but not really 2.

Poetry and pipenv generally add the second, with the ability to sandbox if necessary, to prevent dependency conflicts.

Conda is more geared toward the latter, with a rather heavy-handed approach. With the added benefit that it can handle external dependencies (removing the need for a separate install through apt or brew). Unfortunately it does not do clean sandboxing and therefore doesn't play nice with other projects.

In theory, I don't think it would be impossible to support many of these packagers simultaneously (at least for a while):

Does this make sense?

mmyrte commented 3 years ago

I'm so sorry for not getting back earlier – I was busy finishing my thesis etc. In the meantime, @emanuel-schmid has done a lot of work on the dependency side of things, though I don't know what exactly. (See the issues under dependencies: #167 #161 #159 #158 #157 #107).

Re:

[…] I don't think it would be impossible to support many of these packagers simultaneously […] Does this make sense?

It does make sense to me, but it's a question that IMHO needs to be answered by one single person; I think that's Emanuel. I'm currently also working with CLIMADA in an operational context, so I'd be interested in a stable-but-slightly-unfriendly solution. We could branch and regularly pull from upstream as a last resort. (Tagging @bguillod so he knows about this.)

I'm closing this issue, since the dependency discussion obviously did not get consolidated here.

ps: I'm sorry for hijacking your original issue, but it looks as though the geopandas upgrade you wanted is taking place in version 2.