Open muellerzr opened 2 hours ago
Hey. I'm happy to look into why that version of datasets is selected, but it's likely that this is an equally valid resolution to whatever you're seeing with pip install
. Your datasets dependency is unconstrained, so my guess is that uv is installing a new version of some other package (as compared to pip's resolution) which is leading us to be forced into selecting an older version of datasets. In the absence of defined constraints, it's not clear that using datasets 3.0.0 is better than 2.x.x (assuming we're giving you a newer version of some other package than in pip's resolution). My suggestion here is typically to add a constraint on datasets itself, if you're expecting a certain version to be used.
(I'll try to figure out the dependencies that are leading to this resolution, though. It's typically because something else you depend on is applying an upper-bound on datasets.)
@charliermarsh however the key is it's not. Using pip install
natively gives a different valid resolution than using uv
. If that were the case, I'd expect them to be the same as well but they are not
Can you clarify your comment? There are many equally-valid resolutions for that set of requirements. Separately, do you have a reference to pip's resolution that I can use for comparison (e.g., a comparable CI run)?
@charliermarsh sure.
Currently working run: https://github.com/huggingface/accelerate/actions/runs/11285497593/job/31412054466?pr=3154
Broken run: https://github.com/huggingface/accelerate/actions/runs/11293796747/job/31412771583
(See the Show Installed Libraries steps)
Otherwise phrased as:
uv pip install
's resolution version for datasets
leads to datasets==2.14.4
pip install
's resolution version for datasets
leads to the expected datasets==3.0.1
Everything is the same between both runs except using uv
to do the installs
Ok... So it looks like the uv resolution ends up choosing a more recent version of fsspec (fsspec==2024.9.0
), while the pip resolution ends up with fsspec==2024.6.1
. datasets
then applies an upper-bound constraint to fsspec
(fsspec[http] (<=2024.6.1,>=2023.1.0)
), so we backtrack to an older version of datasets that does not have such a bound.
pip's resolution is more intuitive in that case (and we're working on adding better heuristics for these kinds of upper-bound cases -- it also happens often with Numba, which puts an upper-bound on Numba), but formally both are correct -- they're just different solutions given the set of constraints. That's why I'd typically recommend adding a datasets>=3
lower-bound to your requirements. Otherwise, it's not incorrect for a resolver to give you a resolution that prioritizes getting the most recent fsspec
, even if it's not the preferred outcome.
\cc @konstin as another example of this upper-bound problem leading to a suboptimal resolution
Sadly doing such a large pin like that is a bit of a non-negotiable with our testing and resolutions since we build our frameworks to be backwards compatible as much as possible. I'd rather wait to upgrade to uv
until this problem can be fixed.
To clarify, I consider the resolution here to be formally correct, but unintuitive for users (and want to change it -- the same pattern has come up before).
Another option is you can reorder your dependencies to instruct uv to prioritize datasets over other packages.
I want to offer a more general perspective on version constraints: Using the APIs of your dependencies, you require at least a certain version of that dependency implictly: There's a specific version under which your code works, and lower version that doesn't have the APIs you use yet. By specifying lower bounds on your dependencies, you make those bounds explicit, and you avoid your users accidentally installing an older, incompatible version due to other (transitive) dependencies they have: In your case, datasets v3 is selectable, but in other cases, a user may depend on a library foo that has datasets<3
, they get a broken accelerate and don't realize that it's due to the transitive datasets v2 that should have never been selected.
You can test your lower version bounds with --resolution lowest
or --resolution lowest-direct
(https://docs.astral.sh/uv/concepts/resolution/#lower-bounds). We recommend a CI jobs that checks whether the package works with one of these resolution-lowest settings.
As a user who was advertised to use uv as a drop in replacement, this comes with expectations that behaviors, including resolutions, will be the same between uv and standard pip. The point of this issue is it's not, and if I have the choice of running a debugging script to figure out sudden lower bounds and using the same, I will stick to using what doesn't break. (Until it's fixed)
Hi! I'm trying to upgrade accelerate to use
uv pip install
and we're hitting a weird issue where the dependency resolution wants to only install a very old version ofdatasets
and as a result our suite fails. I'm specifying the exact same environment and everything as the working old installation via justpip install
& python 3.8. As a result it should be installing datasets 3.0.0 like the original workflow does, however we're finding it's installing an old 2.x.x version instead.Any help would be appreciated.
Here is the old workflow and here is the new one.
A broken run log is available here, which is showing that the wrong version of
datasets
is being installed (should be 3.0.0+)