dask / community

For general discussion and community planning. Discussion issues welcome.
19 stars 3 forks source link

Drop zero-padding in release version? #249

Closed jrbourbeau closed 2 years ago

jrbourbeau commented 2 years ago

In https://github.com/dask/community/issues/100 we converged on a YYYY.MM.X calver version format. In particular, we decided to zero pad the month (i.e. use 2022.05.1 instead of 2022.5.1) so that string sorting versions works properly.

However, modern Python version handling normalizes zero padding away so when you, for example, go to PyPI / conda-forge you'll see versions displayed without zero padding https://pypi.org/project/dask/2022.5.1. To my knowledge, this hasn't been a big issue for common Python systems like PyPI/conda-forge, but @jacobtomlinson brought up in a recent monthly community meeting that this zero padding can be problematic for other systems. Specifically, https://github.com/dask/dask-gateway/pull/467 ran into zero padding being problematic for publishing Helm charts.

I'm opening this issue to gather feedback on whether we want to drop zero-padding in our version convention. FWIW my personal opinion is that, while string sorting is nice to have, things like publishing Helm charts (and maybe other systems as well) seem more important to account for. Additionally, even though PyPI/conda-forge treat 2022.05.1 / 2022.5.1 the same, this small discrepancy can still lead to occasional user confusion (xref https://github.com/dask/distributed/issues/6432). Because of this, I'm in favor of dropping zero-padding.

cc @jsignell @jacobtomlinson @jakirkham @quasiben @mrocklin @martindurant @fjetter

jacobtomlinson commented 2 years ago

I am also in favour of dropping the padding.

Zero padded values are not valid SemVer, and while we do not use SemVer there are benefits to remaining valid with it. In my experience I've seen tools strip the zero padding automatically (PyPI does this) and others outright fail.

@consideratio and I have been trying to work around some Golang tooling that drops the zero pad resulting in invalid Docker image names being set in config (this is due to Pangeo using zero padding rather than Dask xref dask/helm-chart/pull/265).

consideRatio commented 2 years ago

I am also in favour of dropping the padding.

I think that PyPI/conda-forge coercing to non-zero-padding from zero-padding and the following possible confusion is a strong reasons to switch.

Also, string sorting would be unreliable. For example, below are some pre-releases involved but alphanumerically sorted. There will also be sorting failures if we have X>9. So, while string sorting is probably more correct with a leading zero, its not perfect still.

1.0.0 1.0.0-beta.1 1.0.0a1 2.0.0 2022.05.11 2022.05.2


Technical note: I think YYYY.0M.X refers to zero padding, while YYYY.MM.X refers to not having zero padding.

jacobtomlinson commented 2 years ago

while string sorting is probably more correct with a leading zero, its not perfect still.

This is another compelling point. I occasionally fall into the trap of using string sorting but it can lead to unexpected results with pre-releases. It happens very rarely but can be a time suck to resolve.

Dropping the zero pad will result in folks avoiding string sorting altogether and therefore a whole category of edge-case bugs.

jakirkham commented 2 years ago

Think I +1'd this in the meeting. So just reiterating that here

Wouldn't worry that much about str comparison. The cases where that comes up are typically pretty crude (one off scripts or on the fly usage). There are plenty of libraries and tools (like packaging), which can handle this for users in a smarter way and are easy to pull in. Solvers know how to do this kind of normalization (and if not it is likely a bug).

Python example: ```python from packaging.version import Version L = ["2022.05.00", "2022.4.1", "2022.05.01", "2022.5.2"] sorted(L, key=Version) # preserve inputs list(map(str, sorted(map(Version, L)))) # normalize ```
consideRatio commented 2 years ago

Ah one additional note btw, if we use 2022.5.1 etc instead of 2022.05.1, you suddenly can use all libraries to compare semver versions as 2022.5.1 is a valid semver version where 2022.05.1 isn't. I recall a library that errored instead of sorting 2022.05.1 as it would sort 2022.5.1.

jakirkham commented 2 years ago

What library? Seems like an issue should be raised

jrbourbeau commented 2 years ago

Thanks for the feedback all! It sounds like there's pretty solid interest in removing zero-padding. I'll leave this issue open for bit longer to give folks like @jsignell @fjetter, who are currently out on PTO, to weigh in.

consideRatio commented 2 years ago

What library? Seems like an issue should be raised

2022.05.5 is not a valid semver version (leading zero not allowed), so a library that refutes the version instead of helping out with a sorting operation actually makes sense. I think this was a golang library.

jakirkham commented 2 years ago

If this is limited to Go and Go enforces hard semver (like as a language requirement), then yeah that makes sense

Generally multilingual tools (OS package managers, users space installers, etc.) don't have the luxury of saying what is acceptable version. There are some things out there that not only violate SemVer, but go a bit beyond that (use of letters, odd/even for unstable/stable, etc.). So restricting to SemVer is not practical in the wider world of software hence tooling tends to handle these differences

jrbourbeau commented 2 years ago

I'm going to say lazy consensus has been reached here and will start dropping zero-padding in the upcoming 2022.6.0 release https://github.com/dask/community/issues/252

jsignell commented 2 years ago

Yeah I am totally fine with dropping the zero padding.

jakirkham commented 2 years ago

Sounds good. Closing then