ProjectPythia / pythia-foundations

Jupyterbook source for the Foundations collection
http://foundations.projectpythia.org
Apache License 2.0
59 stars 42 forks source link

Pythia Foundations Environment #56

Open jukent opened 3 years ago

jukent commented 3 years ago

Should we create a .yaml file with all the packages necessary to run through the Foundations course?

brian-rose commented 3 years ago

We already have an environment.yml file in the root of the foundations repo that will have everything included.

Is the idea to create a simpler environment just containing run-time dependencies but excluding the jupyter-book stuff needed to render the book itself?

brian-rose commented 3 years ago

Revisiting this... I just submitted a PR #70 to add jupyterlab to the environment, because I'm using this environment for authoring notebook content and I want to be able to run in the lab.

It raises a question that I guess I'm not clear on: is our environment.yml meant to be just a minimal environment for building the book only (as used on CI services)?

It's possible that we could maintain two different environments:

The list of Python packages would be the same in both envs.

I would default to putting everything together in a single environment, but I wonder if others have different opinions about this.

clyne commented 3 years ago

I'd vote for keeping the environments as consistent as possible across all sites, intended uses, etc. Simpler is better.

brian-rose commented 3 years ago

So, one master environment to rule them all?

Currently we have at least three different environment.yml files across our repos:

clyne commented 3 years ago

Not sure how practical it is, but it would be nice if there were a single conda environment for all of Pythia, whether you are a user or a contributor.

brian-rose commented 3 years ago

I agree, from a use perspective. However I don't think we can get away from needing a environment.yml file in every repo, to be used by CI services etc.

Maybe it's possible to set up a dependabot service to automatically keep all the environment.yml files in sync (e.g. opening PRs to update the files in other repos whenever we change one of them).

EDIT: I have no idea how to do this, it just sounds plausible.

dopplershift commented 3 years ago

I mean, you could always download an environment.yml from anywhere to use, manually as a step in CI.

But as a I look at things, I don't see a problem with having the 3 different ones. Those serve 3 different purposes:

It's entirely wasteful, slows things down, and opens more opportunities for breakage to have the portal and dataset CI builds download and set up an entire NumPy, Pandas, Scipy, etc. environment every time we update some dataset or tweak the entirely non-Python portal site.

So I agree, favor simplicity--but I'd argue simplicity for keeping infrastructure working. You'd be amazed how often things break. Those environments are created a whole lot more often than (I hope) any of us are creating Pythia environments from scratch. Now, if there's not too much overhead, I'd be happy to see the environment.yml for this (foundations) repo kept up-to-date so that it has everything needed to contribute to any of the Pythia repos.

dopplershift commented 3 years ago

Regarding the original part of having a separate environment.yml that has only what the user needs (with jupyterlab) vs. the full documentation build stack, that certainly seems reasonable and again would reduce the support burden (i.e. picture debugging what could go wrong on user systems). One option would be to start CI by creating an env from environment.yml, then use a separate step to install our doc build dependencies (which could be in its own file).

brian-rose commented 3 years ago

Good points @dopplershift, although I would quibble that our portal site will not be entirely non-Python, as it is built with sphinx. But certainly won't need numpy, cartopy, etc.

I think landing on the foundations environment.yml as the "all-in" environment, while keeping the other repos more bare bones, is a good compromise. And I think that environment needs to contain the full doc build dependencies, because we are trying to build tutorial materials around making modifications to the docs themselves (i.e. the Foundations book), so we want to provide users with an environment for not just running the examples, but also building the book.

clyne commented 3 years ago

Good points. The consistent environment.yml file between different repos is probably less important for users that won't be bouncing back and forth between repos as maintainers have to. Hopefully, the latter are more savvy and don't get tripped up by this (as much as I do:-)

ktyle commented 3 years ago

@dopplershift what's your advice on how best to keep an environment "up to date without breaking things"? When would we need to think about changing specific version requirements, such as =3.8 for python and <1.4 for sqlalchemy?

dopplershift commented 3 years ago

@ktyle If I'm keeping an environment specification unbroken and up-to-date, I'm using Dependabot and pypi-style requirements.txt files (Dependabot doesn't yet support Conda 😢 ) with the version every dependenc explicitly listed ("pinned"). When there's an update, Dependabot issues a PR to update that version, which triggers CI to run whatever tests we have to validate. If it passes (e.g. all the notebooks run), the update is merged. If not, it can be examined further.

Unfortunately, for users to use such a file or files the store is more complicated, since you no longer have a single file with environment name, conda channels, and dependencies.

I got really tired of PRs broken by unrelated changes from upstream package changes. Who knows, maybe this repo will be less sensitive than MetPy's suite of checks.

jukent commented 1 year ago

Should we close this?