Open hollandjg opened 2 weeks ago
FYI, there's discussion on "docs" vs. "doc" happening on a SPEC - it turn out, "doc" is the standard for this, while "docs" is ~3x more popular. "test" is the standard for testing deps, and also the most popular (vs. "test"). https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use
Anyway, I strongly disagree with option 2, as it's incorrect. It makes the environment hard to solve, sometimes impossible to solve. Solvers are very sensitive to the number of dependencies in an environment. There was a time when you couldn't install black
and tensorflow
in the same environment due to dependency conflicts. I've seen solvers choke for minutes[^1] trying to solve updates on massive "all-in-one" environments. And you never need to install your code formatter, or pre-commit, etc. in the same environment as your package, they are stand-alone tools!
Personally, removing [dev]
and never even hinting that installing tools into a single environment would be fine by me.
uv
is currently working on a solution for project-level "pipx"-like installs.
[^1]: On average. The record I've seen was 25 hours.
Also, this is a reason to use task runners. They handle all of this and keep a user from trying to make a "all-in-one" environment.
FYI, there's discussion on "docs" vs. "doc" happening on a SPEC - it turn out, "doc" is the standard for this, while "docs" is ~3x more popular. "test" is the standard for testing deps, and also the most popular (vs. "test"). https://packaging.python.org/en/latest/specifications/core-metadata/#provides-extra-multiple-use
Was not aware of that, thanks for the heads up!
... you never need to install your code formatter, or pre-commit, etc. in the same environment as your package, they are stand-alone tools! Also, this is a reason to use task runners. They handle all of this and keep a user from trying to make a "all-in-one" environment.
I think this is the best argument for task runners, and I think I'm starting to understand some more of the decisions you made when writing this lesson.
It seems to me that one of the functions of the pyproject.toml
file is as a single source of truth for project dependencies β you have the build dependencies in there too (though in a separate section) βΒ and many tool configurations β pytest
, ruff
etc. So implicitly, the dependencies are in there, but if we only declare them as dependencies (and install them) in the task runner script, that feels surprising.
pyproject.toml
?hatch
environments with their own separate dependencies? It seems like the pyproject.toml
needs a standard way of declaring separate environments distinct from the package extras, so that you don't have to rewrite the configuration each time you decide to change your toolset. (Discussion here https://discuss.python.org/t/adding-a-non-metadata-installer-only-dev-dependencies-table-to-pyproject-toml/20106)
BTW, I'm not planning on attempting to merge any of the changes around the task runners and environment set up until we've reached consensus here! Thanks for your patience and all your comments!
There are task runners that use pyproject.toml (specifically, hatch, which is becoming very good). But having a Python based runner is really great for two reasons: first, it's just running commands from Python, so very little DSL to learn and clearly displays what it is doing, and second, if you need to put any logic in, writing it in Python vs. a hobbled "shell"-like configuration syntax is far better.
If you want to move to hatch, I'd not be against that, I've moved repo-review to hatch, and am currently using it for the Princeton NVIDIA hackathon, so getting a feel for how people who've never used any tool like it like it. It's not as flexible as nox for complex tasks, and its built-in defaults are interesting. It really does its own thing with hatch fmt
unless you tell it not to, and hatch test
adds a bunch of pytest plugins but makes running with them really easy. The next minor release will support auto-installing the [test]
extra too. Some sort of doc or docs command is coming soon, and it will either be based on mkdocs or be selectable.
It seems like the pyproject.toml needs a standard way of declaring separate environments
I think you might be thinking of PEP 735.
Though I think the current situation isn't bad: you add test
and doc(s)
extras. In GitHub Actions and Nox and readthedocs, you install with the extras. Then you just need pre-commit, which does environments for you for most of your tools, like ruff, etc. Scripts can use PEP 723. That's 90% of the way there, and there are just a few advanced cases that need manual setup in the noxfile.py that aren't in pyproject.toml, like pylint if you want to use that.
uv
is also activity working on solutions to this, based on what they'v learned with rye
and from the community.
we only declare them as dependencies
Do you have an example of something only declared in a noxfile in the lesson?
We also should probably have a little section on conda pointing people at pixi.
Do you have an example of something only declared in a noxfile in the lesson?
There's only a small mismatch:
noxfile.py
is declared in https://github.com/INTERSECT-training/packaging/blob/main/_episodes/02-jupyter-to-package.md, we have the line session.install("numpy", "pytest")
. dependencies = ["numpy>=1.18",]
and [project.optional-dependenices]; test = ["pytest>=6"]
are added to pyproject.toml
but I don't see that the original explicit session.install
gets updated anywhere. (I may be missing something though).I think you might be thinking of PEP 735.
If I had known about PEP 735, I might have thought of it. π―
I think the mismatch could be improved. I think the idea was the first time it's shown, a minimal amount was being introduced; optional-dependencies haven't been shown yet. But maybe a callout with the fix below that would help?
Yes, maybe that's the cleanest β we can put in a note in the metadata
episode once the optional dependencies are introduced saying "now modify your task runner config to use the optional dependency."
Another option might be to move the section on task runners until after the metadata have been introduced β so we cover the fundamental basics of the pyproject.toml
file including optional-dependencies
and only then add the task runner. What would you say to that?
I agree that from a pedagogical point of view it might be too much to go straight ahead and introduce the optional dependencies the first time we have to install something for testing.
My initial impulse is to say:
src
directory, minimum project metadata in pyproject.toml
with the numpy
dependency, a test in the tests
directory which you can run once you've manually installed pytest
in the venv
you've created but don't yet understand." ... and then we start introducing:
venv
", nox
or hatch
", dependencies
and optional-dependencies
and how they interact with task runners"CITATION.cff
, LICENSE
etc."Ideally, RSEs would start by creating an environment and then starting work, but I think that realistically a lot of early career RSEs will start with the code and then tack on the other stuff afterwards.
On a related note, I think part of the tension with #12 is that I feel that people might be starting with the code and not know anything about environments yet, so I want to meet them there before we get into the motivations around the environment, but perhaps people do already know about environments but might not have a strong understanding yet, in which case it's better to start with the environments and have the whole lesson tell the story of "the clean way" to do a python project, starting with an environment and building from there.
But then you have to introduce creating a virtual environment and entering a virtual environment. If you start with task runners, then you have a "test" task, even if you are going to modify it later, but you haven't had to "enter" an environment. And I expect a lot of people want to start with the code, and what they will actually do is start installing things into the system python, which I want to steer people away from. Once you start messing with your system python, it's hard to clean it back up and not break anything. People really like to stick with however they started; if the lesson starts with running things via tasks, they are more likely to think that way then if it start with the "easy" way of doing things.
The thing we settled on before was:
A different pattern, starting with virtual environments, then packages, then task runners, isn't necessarily bad either (and it's how https://learn.scientific-python.org/development/ is set up), so if you are more comfortable that way, go for it. It's closer to how the tools were originally developed.
Ideally, RSEs would start by ..., but I think that realistically ...
Isn't the point of teaching someone something to change what they do? If we only do things the "easy" way and the way people are most likely to be familiar with already, are we really teaching anything?
It's like saying "we should teach pip install
since everyone is using it already, and not teach pipx install
". Which, to be fair, is exactly what some people do say.
Also, I was thinking an admonition box below the place where the "wrong" way to do it is introduced showing the more advanced way with a link to the future lesson could be useful.
session.install
in the noxfile.py
pointing out that we're going to change it later.
we've a mixture of
test
,docs
,dev
extras in the episodes.We need to be able to deal with:
Option 1: Modular
Extras:
test
withpytest
docs
withmkdocs
,mkdocstrings
orsphinx
etc.dev
withpackage[test, docs]
andpre-commit
Pros: each piecce is compartmentalized Cons: Users need to understand a whole zoo of extras.
Option 2: all in one
Extras:
dev
withpytest
,mkdocs
,pre-commit
... everythingPros: simple Cons: test runners must install everything.