dask / community

For general discussion and community planning. Discussion issues welcome.
19 stars 3 forks source link

Are we happy with CalVer? Is there any easy way to move past it? #372

Open fjetter opened 2 months ago

fjetter commented 2 months ago

A while ago I read an excellent blog post about version schemes that is introducing a version scheme called "Intended Effort Versioning" (EffVer) that I quite like.

I'm not particularly happy with CalVer since it just hides way too much information. As an example, recently we changed the default DataFrame backend to use dask-expr which by any kind of measure should be considered a major change. It has a vast potential for improvement for many users and if we are honest will likely break stuff for some users. Who knows, by heart, which version this was released in? Guess what, I was the release manager on that one and even got the version number wrong when I tested myself just now. For the record, it was 2024.3.0. This was possibly one of the most important releases dask ever had, definitely one of the most meaningful in months if not years but there is nothing to set it apart from a mere maintenance fix.

I don't care that much about semantics but I would like to use the version number to communicate awesomeness and/or risk and the EffVer scheme sounds like it's addressing most concerns (about compatibility and ambiguity) that led dask to adopt CalVer in the first place. We've been using CalVer for about three and a half years and I think it's enough time to collect some experience to talk about it.

How happy are folks generally with CalVer?

And most importantly... If we were to adopt another versioning scheme (even if it's not EffVer), how would that look like? There are Version epochs but I would hate it if users had to specify a version like 1!1.4.2 since the epoch identifier is pretty rare. Are there other possibilities?

jacobtomlinson commented 2 months ago

Thanks for raising this @fjetter. I'm glad you liked my blog post! For folks interested in thinking more about the challenges of CalVer I also wrote this blog post a while ago.

I generally have the same feelings about epochs. In theory they sound like a good way to change scheme, but I'm not quite sure how it would work in practice. I know @minrk was exploring how it works in this repo after having a similar conversation last year, so that ay be a useful resource.

~I think you only need to specify the 1! part at publish time. To continue your example users should be able to pip install distributed>=1.4 and omit the epoch and it will resolve correctly. Assuming this is true we may want to introduce a little automation around the release to handle this. If we had a GitHub Action that was triggered on tags and pushed to PyPI it could also handle prepending the epoch, so the tag would be 1.4.2 but PyPI would be told 1!1.4.2 by the Action.~ I was wrong, see https://github.com/dask/community/issues/372#issuecomment-2072035359)

What I don't know is how Conda Forge handles this. Maybe @jakirkham has some thoughts?

If we do want to go down this road we could choose a less prominent project like dask-kubernetes to experiment with and I'd be happy to take the lead on trying things out. That repo also has the publish on tags workflow set up already which is convenient.

jacobtomlinson commented 2 months ago

Thinking more about Dask changing scheme I would be tempted to suggest that we review it on a repo-by-repo basis. For most library style projects (distributed, dask-kubernetes, dask-jobqueue, etc) I think something like EffVer (or SemVer) would make most sense. For projects that hold only documentation (dask-tutorial, dask-examples, etc) I think CalVer is still a fine choice.

For dask/dask I'm a little more torn because it is a library, but it implements a variety of popular APIs from other libraries, which makes it something of a distribution. So in some ways I think CalVer does make some sense here. But in other ways it is just a library so maybe EffVer would be a good choice.

fjetter commented 2 months ago

using a different versioning scheme for docs only projects makes sense. I'm less convinced about the decoupling of schemes for dask/dask and distributed since both are still coupled tightly. Getting rid of that hard pin is an entirely different topic

jacobtomlinson commented 2 months ago

I went back over the epicepoch experiment that @minrk did last year and I decided to push my own test package and play with epochs some more.

You can find the full results here https://github.com/jacobtomlinson/epochexperiments.

The key things I've learned from playing around with it are:

These quirks are enough for me to say that we shouldn't go down the road of using epochs, they will add too much burden and confusion for users. Which pretty much means we are stuck with CalVer, unless we want to go to Dask 3000!

fjetter commented 2 months ago

These quirks are enough for me to say that we shouldn't go down the road of using epochs, they will add too much burden and confusion for users.

I agree. Thanks for checking