Open jakirkham opened 1 year ago
Can we merge these two issues just to make it easier to track them?
The issue Axel raises seems like a subpoint of this issue (depending on what we decide). Namely do we want to opt-in to this newer/slimmer ABI and how does that fit into NumPy 2
Sure. IMO Axel issue is a subset of this one. I don't have strong opinions on which one to keep, or if you want to keep both, but I also don't want to get lost on two mega-threads :grimacing:
Added Axel's item to the list above
Handling the ABI is the key point here (that and current packages missing a <2
). I updated the added item because the summary was not accurate.
Normally I'd say we do a dual migration (keep 1.x; add 2.0), but numpy has around 5000** dependents in conda-forge, so that would be a pretty substantial CI impact, especially if it takes a while to drop 1.x.
**
>mamba repoquery whoneeds numpy -c conda-forge -p linux-64 > tmp
># edit to remove header
>python
>>> q = open("tmp", "r").readlines()
>>> p = {x.strip().split(" ")[0] for x in q} - {""}
>>> len(p)
4898
Obviously not all of them are compiling against numpy, but still...
I updated the added item because the summary was not accurate.
Thanks Axel! π
Anyone should feel free to update the issue as needed π
Following up on our discussion earlier about improving the visibility of NPY_FEATURE_VERSION
, started this NumPy PR ( https://github.com/numpy/numpy/pull/24861 ) to message how the value is set
Also include a note about one approach we might take to ensure that value is embedded in built binaries. Though maybe there are better approaches for that portion
There should now be a string baked into binaries built with NumPy to notate what kind of NumPy compatibility they have
It is worth noting that thanks to Axel and others we now have NumPy 2.0.0rc1 packages: https://github.com/conda-forge/numpy-feedstock/issues/311
Also ecosystem support of NumPy 2 is being tracked in this issue Ralf opened: https://github.com/numpy/numpy/issues/26191
We are now in a good spot to start testing building packages with NumPy 2
I discussed this with @rgommers recently and one important point that he brought up is the situation with pin_compatible
, which we'll have to fix as part of any migration effort, probably with a piggyback migrator, since we'll need to rewrite the recipes.
In particular, since numpy
isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible
under run:
. However this will be doubly incorrect in the new setup - for one as our NPY_FEATURE_VERSION
(which forms the lower bound) will be lower than the one at build time, and second because the upper bound should be something like <2.{{ numpy.split(".")[1] | int + 3 }}
(for a project that's free of deprecation warnings; anything else might be deprecated in 2.{N + 1}
and removed after two releases in 2.{N + 3}
).
In particular, since
numpy
isn't separated into a library and run-time component, we don't have a run-export [...]
Of course, if there's appetite for a split into libnumpy
(with a run-export) and numpy
(the python bits), that might be worth a thought as well. But then even moreso, we'd need a piggyback.
Of course, if there's appetite for a split into
libnumpy
(with a run-export) andnumpy
(the python bits), that might be worth a thought as well.
That doesn't sound good to me as a custom conda-forge split. If we want anything like that, let's do this properly and create a numpy-headers package that's officially supported by NumPy and that can be used by anyone (unless they need a static library or numpy.f2py
) with a build-time dependency on the NumPy C API. We actually discussed this in a NumPy community meeting, and it seems feasible.
In particular, since numpy isn't separated into a library and run-time component, we don't have a run-export, and so feedstocks use pin_compatible under run:
We do have a run_export on numpy.
Yeah... Clearly I shouldn't be writing these comments from a phone π
I misremembered that part, but in that case the task becomes easier - we just set up the right run export in numpy itself, and then remove pin_compatible
in feedstocks that compile against numpy. Right?
Another question we have to answer soon: what mechanism do we want to use for setting NPY_FEATURE_VERSION
... Perhaps the easiest would be an activation script in numpy
, but that's a fairly big hammer, as it persists beyond build time and into all user environments.
Right now I'm thinking of setting NPY_FEATURE_VERSION
in the global pinning (cleanly overrideable per feedstock where necessary), and then using that in conda-forge-ci-setup to populate the environment variable that numpy
will pick up (and if necessary, in the compiler activation feedstocks, e.g. for CFLAGS
).
The only issue there is that the run-export on numpy
is not dynamic, in the sense that it gets fixed to the value of NPY_FEATURE_VERSION
at the build time of numpy, and not the (potentially different) one in play when building something else against numpy. Unless I'm overlooking something, we'd therefore need to transform rather than remove the existing uses of pin_compatible("numpy")
with something like
- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}
while the upper bound (<2.{N + 3}
) would be set by the run-export on numpy
.
What if we had something like...?
{% set version = "2.0.0" %}
package:
name: numpy
version: {{ version }}
...
build:
...
run_exports:
- {{ pin_subpackage("numpy", lower_bound=os.environ.get("NPY_FEATURE_VERSION", version)) }}
That way we can defer this environment variable setting to packages
If they don't set something, we can provide a sensible default (either version
or something else we decide)
We could also consider whether conda-build could allow NPY_FEATURE_VERSION
to be a pass through environment variable or if we handle that within conda-forge with some recipe changes to pass it through ourselves. This would let us set us use a global setting (as you suggest)
I don't think this type of NPY_FEATURE_VERSION
setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for. So all produced extension modules will work with all possible numpy versions that can actually be installed.
Hence, doing nothing should be the right default here, trying to change it from NumPy's default will likely only be the cause of extra complexity/confusion, and perhaps bugs.
That way we can defer this environment variable setting to packages
I'd be surprised if it works like that. AFAIU, that os.environ
call will be resolved while building numpy.
I don't think this type of
NPY_FEATURE_VERSION
setting is useful at all. NumPy guarantees to set it to a version that is not higher than the first numpy release that supported the Python minor version being built for.
Leaving aside NEP29, this is a quantity we have to be able to control IMO. Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks). I don't see how we can reasonably avoid making NPY_FEATURE_VERSION
explicit in conda-forge in some way.
I very well could be wrong. It is easy to test
Really we just need more ideas to sample from. It's more important that we have a large variety before selecting one. So feel free to propose more
Otherwise our metadata for packages building against numpy is bound to be wrong, and deteriorate over time (when numpy inevitably moves the lower bound, and we don't move in sync across all our feedstocks).
It won't be wrong. The metadata that is in the upstream releases (i.e. the dependencies
key in pyproject.toml
files) is going to be updated by package authors, and that's the thing that should be relied on by conda-forge builds. The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.
I'm not sure I follow. Say a project has numpy >=1.24,<2.3
in its pyproject.toml
, is there some sort of hook that populates NPY_FEATURE_VERSION
to 1.24? If so, how would that constraint arrive in the metadata of the packages we build?
Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.
Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly (unless they really need to), or extremely slowly (i.e. going back to 1.19 adds about 2 years on top on top of what NEP29 foresees w.r.t. being able to use a given ABI feature).
The build-time version of numpy is now simply taken off the table completely, it no longer adds an extra constraint.
In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION
. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).
Or do you mean that the default for that in numpy is so low (1.19?) that it won't ever become a factor? That seems doubtful to me.
This. And it's not doubtful, it is guaranteed to work. The whole point is to take away build-time version as a thing that dynamically overrides the declares runtime dependency range.
Even aside from those questions, we still have an interest to provide a common baseline for numpy compatibility (so that most of conda-forge is still usable with the oldest supported numpy), and avoid that individual packages move on too quickly
No, that does not make sense. If a package has numpy>=x.y
in its constraints, you cannot just ignore that. The package author bumped the lower version for some reason, so if you tweak the metadata to say numpy>=x.y-N
instead, you will allow a broken combination of packages.
In summary, this seems highly dubious to me. There's still a lower bound somewhere, either in numpy's default feature level, or in an explicit override of NPY_FEATURE_VERSION. However it comes to be, we should represent that lower bound in our package metadata exactly (or at the very least, something tighter).
No, and no. The lower bound is whatever dependencies=numpy...
in pyproject.toml
says, or it's a bug in the package (even if the package internally sets NPY_FEATURE_VERSION, which should be quite rare).
What the conda-forge tooling should do is check that the meta.yaml
and pyproject.toml
metadata is consistent - and I think that is a feature already present for regular Python packages. I.e., start treating numpy
like any other Python package when building against numpy 2.x.
No, that does not make sense.
You chopped off the part of my quote that accounts for the scenario you describe.
The lower bound is whatever
dependencies=numpy...
inpyproject.toml
says
I'm not saying we should disregard runtime constraints. I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.
What the conda-forge tooling should do is check that the
environment.yml
andpyproject.toml
metadata is consistent
I don't see this happening soon enough to be available for the 2.0 transition, it would need work on conda-build AFAICT.
and I think that is a feature already present for regular Python packages. I.e., start treating
numpy
like any other Python package when building against numpy 2.x.
I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages? Anything else that has a run-export (to my knowledge) uses the build-time version as a lower bound. That's precisely the issue that requires attention here, because of the very unusual situation how building against numpy 2.0 produces something compatible with >=1.x
.
I'm saying we also need to express constraints arising from the feature level - both of those can be attached to the same package without conflict. They stack and the intersection of both is what's actually permissible for the final artefact.
What I am trying to explain is that that stacking is not doing anything, because
numpy
will never set the feature version in a way that allows for this to have any effect, andNPY_FEATURE_VERSION
to something higher than what it says in its pyproject.toml
, that's a bug in the package and should be fixed there by fixing its dependencies=
metadata.I'm not sure what you mean here. Presumably by "python package" you don't mean "pure python" packages?
Ah, I did mean this since I remember dependencies being flagged on PRs - but it may not be ready indeed, since it's marked as experimental:
So it's still mostly manual then, depending on the feedstock maintainers to keep pyproject.toml
and meta.yaml
in sync?
For clarification, should environment.yml
here be the recipe's meta.yaml
? Or do you mean something else Ralf?
There are different levels of bot inspection or automation. However this is opt-in at this point. It is seeing some use in conda-forge, but we are probably not at the point where we could turn this on by default. Though that's a separate discussion I think
For clarification, should
environment.yml
here be the recipe'smeta.yaml
? Or do you mean something else Ralf?
Yes indeed. General tiredness π€¦πΌ. Editing my comment to say meta.yaml
to avoid further confusion.
Thanks Ralf! π All good. Appreciate hearing your insights and having your support here π
Can imagine there are a lot of spinning plates with this work π
What I am trying to explain is that that stacking is not doing anything, because
numpy
will never set the feature version in a way that allows for this to have any effect, and
I'm still not sure we're speaking the same language here. We'll have numpy 2.0
in the host environment, and we need to create a lower bound for the numpy run-export, i.e. numpy >=1.x
. Where should this x
come from?
We really need to have correct metadata, because the solver will flee into the past if we don't close off incorrect avenues. For example, we're still building for py38, and have numpy 1.18 builds for that. Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.
You're saying (effectively) that people's numpy dependencies=
in their pyproject.toml
are always going to be higher than the default NPY_FEATURE_VERSION, and that's an assumption that I don't think will hold. As the first (I swear...) random example I looked at, cvxpy HEAD uses numpy >=1.15
. So we need to deal with this.
And once we deal with this, it's IMO not a good idea to just define a fixed run-export in numpy
itself, because packages will want to override this - for example, if they need an ABI feature that's newer than the default of 1.19 for whatever reason. Hence why I'm tending towards
- {{ pin_compatible("numpy", lower_bound=NPY_FEATURE_VERSION) }}
per feedstock, with NPY_FEATURE_VERSION
part of the global pinning (and overrideable per feedstock) instead of numpy
.
Anything built against numpy 2.0 (assuming I undestood correctly that the default ABI level is going to be 1.19) needs to be impossible to install with 1.18, hence needs the right constraints.
It is impossible. The lowest Python version supported by NumPy 2.0 is 3.9
. There is no numpy
package for 1.18 on either conda-forge or on PyPI:
$ mamba search numpy=1.18.5
Loading channels: done
# Name Version Build Channel
numpy 1.18.5 py36h7314795_0 conda-forge
numpy 1.18.5 py36he0f5f23_0 conda-forge
numpy 1.18.5 py37h8960a57_0 conda-forge
numpy 1.18.5 py38h8854b6b_0 conda-forge
So yes, it is by design of how numpy sets the default targeted C API impossible to get an incompatible combination here, even in an example like cvxpy where they set their lower bound to 1.15 (that could well be valid if they still support Python 3.7, no idea).
For example, we're still building for py38
There cannot be a numpy 2.0 package for py38, so this isn't relevant.
So we need to deal with this.
I'm still convinced that this is not true - it does not need dealing with explicitly in conda-forge recipes because it cannot go wrong. It's perfectly okay for the conda-forge cvxpy
version to have >=1.15
in its metadata, that will work for any actual numpy
package built by conda-forge.
So yes, it is by design of how numpy sets the default targeted C API impossible to get an incompatible combination here
OK, that is the key piece I was missing. The necessary >=1.x
I was talking about is enforced implicitly, simply by the lack of availability of older numpy builds for whatever Python version we're using.
So, for my understanding (and walking through the logic), once numpy drops support for a given Python version, the default NPY_FEATURE_VERSION
can move up (ignoring for now that not every release changes ABI) until the first numpy version that supported the currently oldest-supported Python version? Is that the idea?
Yes, that sounds right. To rephrase that:
pyproject.toml
contains and enforces the lowest-supported version, e.g.: requires-python = ">=3.9"
1.19.0
for py39) determined the maximum version that NPY_FEATURE_VERSION
may be set to>=3.9
is bumped to >=3.10
, then that max-allowed version for NPY_FEATURE_VERSION
moves up.We must be getting this right in numpy, or there'll be a ton of issues. And I don't want conda-forge to have to worry about setting NPY_FEATURE_VERSION
for several reasons:
Thanks @rgommers , @h-vetinari , and @isuruf for joining the meeting today! π
My understanding of next steps from our discussion are to
{{ pin_compatible("numpy") }}
from recipes
run_exports
handles this correctly for NumPy 1pin_compatible
will be unneeded as NumPy targets the oldest NumPy C API for a given Python version (say 3.9)run_exports
in our NumPy 2 packages to make NumPy 1.22 the minimum supported versionAre there any other steps we should consider? Anything we should revise in the steps above?
Once we are happy with the list, will add a checklist to the OP of this issue to track
Please let me know what you think π
That sounds about right to me. Before that, https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/issues/516 may need checking/doing?
My understanding of next steps from our discussion are to
- Start a NumPy 2 migrator that uses our RC packages (similar to what was done for Python 3.12)
This is the first step that's necessary, but that already runs into several problems, see https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5790
- Start a piggyback migrator to drop
{{ pin_compatible("numpy") }}
from recipes
We need to write this (not so difficult) and attach it (in regro/cf-scripts) to the numpy2
migrator from the previous point. I'll do this when we get closer to having a mergeable numpy2
migrator.
- Update the
run_exports
in our NumPy 2 packages to make NumPy 1.22 the minimum supported version
This one's easy: https://github.com/conda-forge/numpy-feedstock/pull/313
- https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/issues/516) may need checking/doing?
Done: https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/712
Thanks Axel and Ralf! π
Created a list in the OP linking PRs or issues. Please double check that. Happy to update as needed
For the piggyback migrator, think we don't need to wait actually. More details in issue ( https://github.com/regro/cf-scripts/issues/2469 ). Happy to discuss further there
There's no need to build for both 1.x and 2.x at the same time. If a package does not support numpy 2, the migration is held at that point and we wait until that package support numpy 2.
Agreed the migrator can just replace NumPy 1.22 with NumPy 2.0. Both will produce packages that work on NumPy 1.22+ (the latter providing NumPy 2 support)
There's no need to build for both 1.x and 2.x at the same time.
Of course there is - pulling forward the building of the 2.0-compatible builds before the release of numpy 2.0 GA. This is exactly what we did for python 3.12.0rc. Once the GA release happens, we can then publish numpy 2.0 into main, and all the other packages built against 2.0rc1 will be installable in a numpy 2.0 world already.
pulling forward the building of the 2.0-compatible builds before the release of numpy 2.0 GA. This is exactly what we did for python 3.12.0rc.
This is what the consensus in the core call was. It may well turn out to be hard/impossible due to constraints around pinning/zips/smithy, but that was the desire at least.
Maybe I'm misunderstanding. Do we want to publish packages built using NumPy 2.0.0rc1 to a different label (say to do additional testing on them before releasing them in the wild)? Or do we just want to publish them to main
?
Maybe I'm misunderstanding. Do we want to publish packages built using NumPy 2.0.0rc1 to a different label (say to do additional testing on them before releasing them in the wild)? Or do we just want to publish them to
main
?
So there's been a fair amount of relevant discussion in https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/pull/712 that should probably be moved here, but for completeness:
What we did for python 3.12.0rc was that we pulled python
itself from a label (conda-forge/label/python_rc
), but the packages built against 3.12 were published to main
. Still, they couldn't be installed, because they depended on a python-build that was only available by explicitly opting in with -c conda-forge/label/python_rc
. However, because of the ABI stability guarantee of CPython 3.12.0rc vs. GA, the moment we published python 3.12.0
to main
, all those builds became installable (we had roughly ~1000 feedstocks done by the GA release).
Numpy gives the same guarantees about ABI stability between rc & GA, and so we could do exactly the same thing. I.e. packages built against numpy publish to main
, but depend on a numpy
that requires an opt-in channel. That would give us time to roll thing out in advance, but then the only way to roll that out over a period of ~weeks before GA release is if we do the 2.0 builds additively.
packages built against numpy publish to
main
, but depend on anumpy
that requires an opt-in channel
Except that there is no runtime dependency on a newer NumPy: That package will be installable right away since it also works with old NumPy versions. The logic you should need is basically:
(Quite likely, I am misreading, but it looked a bit like that try wanted to build two versions, which would be fine but seems unnecessarily complicated.)
Except that there is no runtime dependency on a newer NumPy: That package will be installable right away since it also works with old NumPy versions.
You're right about that, the lower run-time dependence changes the calculus (unless we add another constraint on packages built during the RC phase, which I would have ended up with, see below).
Use NumPy 1.22 for Python 3.8
That's a good point actually; ~I think we should seriously consider dropping 3.8, rather than complicating the numpy 2.0 setup into a before/after 3.9 thing.~ should be OK iff we build everything else against 2.0rc1 π€·
Even if a package only supports 1.x at runtime as long as compilation succeeds it'll be fine to compile it with NumPy 2.
My assumption was we don't yet want builds against the RC to immediately be generally available, but if people are fine with that, why not?
but if people are fine with that, why not?
I would say this is a gamble that may be fine (assuming the build succeeds). It seems pretty unlikly to go bad and if it fails would also seem to point to critical NumPy issues?
Basically, it isn't even a gamble: SciPy, matplotlib, pandas, etc. etc. already do this for wheels!
The one issue I can think of is weirder constructs that we saw in dependencies of eigenpy
: eigenpy
depedenecies may use NumPy through eigenpy
which effectively makes them a single compilation unit from NumPy's perspective. That could be problematic unless they are transitioned all in one go (pinning each others versions). Likely, that isn't even a problem for conda, though.
My assumption was we don't yet want builds against the RC to immediately be generally available, but if people are fine with that, why not?
Either approach seems potentially reasonable
We could start by putting a few packages built with the NumPy in a special label and test them out for a bit. Then, once we are comfortable, flip over to building and publishing them to main
Had thought maybe we want to do this with a few core ones like SciPy, Matplotlib, Pandas, some of the scikit-*s. That said, no strong feelings on this approach
I updated https://github.com/conda-forge/conda-forge-pinning-feedstock/pull/5790 to build against 2.0.0rc1 directly; it looks like it still needs a smithy fix, see https://github.com/conda-forge/conda-smithy/issues/1911
Migration is a-go! :)
Oh, you know, just a little migration... *checks notes*... 2800 affected feedstocks π«£π
OK, it looks like that number is not correct - one of the packages that stood out to me was importlib_metadata, which doesn't depend on numpy at all. It's probably not a great sign that this is being picked up.
NumPy is currently working on a NumPy 2.0 release, which is planned to come out later this year. Here are the current (draft) release notes. Also here's the upstream tracking issue ( https://github.com/numpy/numpy/issues/24300 ), and ecosystem compatibility tracker.
Some questions worth discussing:
numpy
pins in packages ( https://github.com/conda-forge/conda-forge-repodata-patches-feedstock/issues/516 )?Todos:
cc @conda-forge/core