Open fjetter opened 2 months ago
Thanks for raising this @fjetter. I'm glad you liked my blog post! For folks interested in thinking more about the challenges of CalVer I also wrote this blog post a while ago.
I generally have the same feelings about epochs. In theory they sound like a good way to change scheme, but I'm not quite sure how it would work in practice. I know @minrk was exploring how it works in this repo after having a similar conversation last year, so that ay be a useful resource.
~I think you only need to specify the 1!
part at publish time. To continue your example users should be able to pip install distributed>=1.4
and omit the epoch and it will resolve correctly. Assuming this is true we may want to introduce a little automation around the release to handle this. If we had a GitHub Action that was triggered on tags and pushed to PyPI it could also handle prepending the epoch, so the tag would be 1.4.2
but PyPI would be told 1!1.4.2
by the Action.~ I was wrong, see https://github.com/dask/community/issues/372#issuecomment-2072035359)
What I don't know is how Conda Forge handles this. Maybe @jakirkham has some thoughts?
If we do want to go down this road we could choose a less prominent project like dask-kubernetes
to experiment with and I'd be happy to take the lead on trying things out. That repo also has the publish on tags workflow set up already which is convenient.
Thinking more about Dask changing scheme I would be tempted to suggest that we review it on a repo-by-repo basis. For most library style projects (distributed
, dask-kubernetes
, dask-jobqueue
, etc) I think something like EffVer (or SemVer) would make most sense. For projects that hold only documentation (dask-tutorial
, dask-examples
, etc) I think CalVer is still a fine choice.
For dask/dask
I'm a little more torn because it is a library, but it implements a variety of popular APIs from other libraries, which makes it something of a distribution. So in some ways I think CalVer does make some sense here. But in other ways it is just a library so maybe EffVer would be a good choice.
using a different versioning scheme for docs only projects makes sense. I'm less convinced about the decoupling of schemes for dask/dask
and distributed
since both are still coupled tightly. Getting rid of that hard pin is an entirely different topic
I went back over the epicepoch
experiment that @minrk did last year and I decided to push my own test package and play with epochs some more.
You can find the full results here https://github.com/jacobtomlinson/epochexperiments.
The key things I've learned from playing around with it are:
pip install dask>=1!4.0.0
, which kinda sucks.0!
epoch in some cases, like when using wildcards, which can lead to unintuitive version resolving like epochexperiments>=2024.*
resolving to the newest CalVer release and not the latest release.
epochexperiments>=2024.0
and epochexperiments>=2024.*
resolve to different things.0!
epoch. For example if you have released v1!4.0.1
you cannot do pip install epochexperiments==4.0.1
, even though 0!4.0.1
doesn't exist.These quirks are enough for me to say that we shouldn't go down the road of using epochs, they will add too much burden and confusion for users. Which pretty much means we are stuck with CalVer, unless we want to go to Dask 3000!
These quirks are enough for me to say that we shouldn't go down the road of using epochs, they will add too much burden and confusion for users.
I agree. Thanks for checking
A while ago I read an excellent blog post about version schemes that is introducing a version scheme called "Intended Effort Versioning" (EffVer) that I quite like.
I'm not particularly happy with CalVer since it just hides way too much information. As an example, recently we changed the default DataFrame backend to use dask-expr which by any kind of measure should be considered a major change. It has a vast potential for improvement for many users and if we are honest will likely break stuff for some users. Who knows, by heart, which version this was released in? Guess what, I was the release manager on that one and even got the version number wrong when I tested myself just now. For the record, it was 2024.3.0. This was possibly one of the most important releases dask ever had, definitely one of the most meaningful in months if not years but there is nothing to set it apart from a mere maintenance fix.
I don't care that much about semantics but I would like to use the version number to communicate awesomeness and/or risk and the EffVer scheme sounds like it's addressing most concerns (about compatibility and ambiguity) that led dask to adopt CalVer in the first place. We've been using CalVer for about three and a half years and I think it's enough time to collect some experience to talk about it.
How happy are folks generally with CalVer?
And most importantly... If we were to adopt another versioning scheme (even if it's not EffVer), how would that look like? There are Version epochs but I would hate it if users had to specify a version like
1!1.4.2
since the epoch identifier is pretty rare. Are there other possibilities?