Version 22.11rc1 - 2022-11-10 has a weird version number

SimonVrouwe commented 1 year ago

According to doc

this project adheres to Semantic Versioning.

whitslack commented 1 year ago

The general trend toward calendar-based software version numbering is disappointing to me because it dispenses with all of the benefits of semantic versioning, most beneficial of which being the ability to tell at a glance whether two versions are API-compatible. Looking at CLN versions 22.11 and 23.01, it will be impossible to know at a glance whether the differences between them comprise bug fixes and minor feature enhancements or sweeping, breaking changes. I really don't understand why calendar-based version numbering is gaining popularity in the software development community. How does the date that a release was tagged convey any relevant information regarding that release's significance in the progression of the project? Alas, in my experience it's usually fruitless to try to argue against decisions like this. Whoever decided to stupidify CLN's version numbering scheme likely won't be convinced they're wrong.

grubles commented 1 year ago

The new versioning seems to break at least one plugin (I haven't checked others). Peerswap was also affected.

plugin-feeadjuster.py: ValueError: 22.11rc1 is not valid SemVer string

whitslack commented 1 year ago

ValueError: 22.11rc1 is not valid SemVer string

That's only because of the rc1. 22.11 by itself is a syntactically valid SemVer string, although it doesn't represent a semantic version.

cdecker commented 1 year ago

Yes, I totally understand that this comes with very little public discussion, so let me add a bit of context to the switch in versioning scheme that we plan to do with this release.

From very early on, I think we started in 2019, we set ourselves a very fast release cadence of anywhere between 2 to 3 months between releases. This was for a couple of reasons:

The Lightning Specification is a very fast moving target with many changes being discussed during the bi-weekly spec meetings. An implementation is almost forced to follow a similarly fast cadence otherwise we cannot confirm that protocol changes are interoperable (at least two implementations are required for a spec change to be verified and eligible to become part of the spec).
Similarly, incompatibilities between implementations need to be addressed in a reasonable amount of time. When someone detects an incompatibility, we usually go back to the spec, check which party is at fault or if it turns out it is still undefined in the spec we have to pick one, and the other one has to address the issue with a fix. A slow release cadence would essentially make the network unusable, or worse, we'd be encouraging a monoculture of implementations (bad for reasons I won't go into detail now).
We also don't want users to have to wait for a new feature for too long, and even more importantly we don't want them to wait too long for fixes either.
We want to avoid crunch times as much as possible. By having the next release not too far after the previous one it is less of an issue that the big feature one developer wanted to get in didn't make it, since it is at most delayed by 3 months. This is important in open-source, since developers, in particular volunteers, are a scarce resource, and we shouldn't add time pressure to the equation.

3 months seems like a good tradeoff between overhead and speed to deployment. Why am I telling you all about this? Well, it turns out that semantic versioning, which we've used so far, is great as long as the surface of the project is well-scoped and limited in size. This allows for breaking changes to be a) identified and b) deferred until they have a breaking release window again. With CLN that is not the case: it's a project with a large surface area and many interfaces, where potential incompatibilities are often hidden, and deferring changes just not to break compatibility can be dangerous (we're handling money after all).

That has often meant that the difference between major and minor version bumps have mostly been arbitrary, with some minor versions introducing cool new features, and major versions mostly consisting of bug fixes. So I think it is fair to say that we weren't following semantic versioning closely, due to the complexities involved in tracking what is and isn't backwards compatible, and that the switch in versioning scheme is us recognizing that and rather embracing a scheme that actually has more information (release date and thus age of a node).

P.S.: As the release captain for this version I was the one who chose to go down this path, so please blame me for any fallout from this decision

rustyrussell commented 1 year ago

We never adhered to semantic versioning! It's not even clear what that means:

Protocol compatibility (we only broke protocol once, for the spec rewrite)
JSON-RPC compatibility: we add features every time, and remove deprecated features almost every release.
Plugin compatibility: slightly less volatility than above, but neither a useful thing for most users to know, nor something we track.

We should stick with - before rc markers though, since that keeps semver happy and is consistent with the past.

We will also use the third point for bugfix releases.

The real reason to use dates is the lack of otherwise sensible semantics ends up with the Release Captain wasting time thinking about "is this a major release?".

SimonVrouwe commented 1 year ago

Thanks for the context,

So I think it is fair to say that we weren't following semantic versioning closely, due to the complexities involved in tracking what is and isn't backwards compatible, and that the switch in versioning scheme is us recognizing that

Fair enough, admission is the first step...

and rather embracing a scheme that actually has more information (release date and thus age of a node).

Will there (still) be a minimum transition period (in the JSON-RPC API) before deprecated features are removed?

We never adhered to semantic versioning!

then please don't claim you do - in the doc :)

whitslack commented 1 year ago

I, too, appreciate the admission that semantic versioning requires more mindshare than any of the devs are willing to allocate to it. That's fair.

It's not even clear what that means

My wish is that it would mean knowing where the "ratchet points" are — the versions that, after being run once, cannot be reverted to before. My nightmare scenario is upgrading to a new version that successfully upgrades the database but then consistently crashes (or otherwise refuses to maintain connections with peers). Best case: I'd have hundreds of peers force-closing my channels, costing me a pretty penny in fees. Worst case: I wouldn't even be able to get my node running to watch out for cheating attempts. Upgrading this software is a very risky proposition, far more so than any other software package I've ever experienced. And my fears aren't without precedent either, as seemingly every new CLN release engenders bug reports from users who are left unable to start their nodes and waiting for a hotfix. It would be very helpful to know which releases are these high-risk ratchet points, and that's what I would hope to see in the "major" version numbers. Just my 2 millisats.

cdecker commented 1 year ago

Will there (still) be a minimum transition period (in the JSON-RPC API) before deprecated features are removed?

Absolutely! The existing deprecation mechanism will continue to be used, with changes being introduced behind the --allow-deprecated-apis flag, allowing integrations and apps to test if anything breaks with ~6 months of lead time, and only then are changes made permanent, by removing the deprecated code or switching the default behavior. That guarantee becomes easier to track too, since date-based version number immediately tell you whether a deprecation is going to be enforced, due to the 6 months spanning pretty much exactly 2 releases.

then please don't claim you do - in the doc :)

Agreed, I will remove that claim from the doc.

My wish is that it would mean knowing where the "ratchet points" are — the versions that, after being run once, cannot be reverted to before. My nightmare scenario is upgrading to a new version that successfully upgrades the database but then consistently crashes (or otherwise refuses to maintain connections with peers). Best case: I'd have hundreds of peers force-closing my channels, costing me a pretty penny in fees. Worst case: I wouldn't even be able to get my node running to watch out for cheating attempts. Upgrading this software is a very risky proposition, far more so than any other software package I've ever experienced. And my fears aren't without precedent either, as seemingly every new CLN release engenders bug reports from users who are left unable to start their nodes and waiting for a hotfix. It would be very helpful to know which releases are these high-risk ratchet points, and that's what I would hope to see in the "major" version numbers. Just my 2 millisats.

Definitely a vision that I share too, however difficult to realize in reality: database changes indeed have this ratchetting behavior, and we recently introduced the --database-upgrade=true option to prevent accidentally ratching up. However the issue remains: at some point users will have to upgrade, at which point they will apply the DB changes, and not be able to downgrade anymore. The more beta testers on release candidates and running bleeding edge we have the better we can ensure that nothing breaks. The bigger the changes between known good version upgrades the better we can predict if something will break.

So while I agree that the one-way upgrade is sub-optimal, and the temptation to batch them as much as possible is great, batching can also increase the chances of a breaking change, which would then lead to longer downtime and very frantic releases to address the fallout because no-one tested before the final release.

whitslack commented 1 year ago

What kills me is that most of the database schema upgrades are, in principle, reversible, as seldom do schema upgrades actually destroy any information; they mostly just move it around and augment it. It would be nice if each reversible database schema upgrade would store a "downgrade statement" in a table, keyed by schema version. Then, if lightningd discovers that the database schema is currently at a newer version than the version it supports (i.e., the user has downgraded the software) and if the user has set --database-downgrade=true, then the statements in the downgrades table will be executed (in reverse order by schema version) to revert the schema back to the version expected by the software currently being run. I recognize that such a mechanism would not be able to revert all upgrades since some reversions would require executing complex procedures that cannot easily be coded in SQL, but I think most reversions could be coded as SQL.

cdecker commented 1 year ago

I'd be very happy to add a downgrade or unapply operation to the DB, however the issue is in how to distribute them:

Each binary knows how to get from an empty or partially migrated database to its own version
If a newer version starts it can upgrade the DB to the version it wants because it knows the migrations to do so
If an older version starts it does not know about any migrations that were applied after its own state, so it'd also not know the downgrade migrations since it predates those migrations, so how could it downgrade?

I think it might be easier to have a separate migration tool that can be apply or unapply changes, as long as the migrations are reversible in the first place. How does that sound?

whitslack commented 1 year ago

If an older version starts it does not know about any migrations that were applied after its own state

@cdecker: Respectfully, please re-read my previous comment more carefully. I proposed that each schema upgrade would store the corresponding downgrade statement(s) in the database.

cdecker commented 1 year ago

I see, I missed that part indeed. Could be a solution indeed. Let me mull this over a bit

Apologies for misreading earlier, I am a bit distracted by building the release 🙂

m-schmoock commented 1 year ago

22.11 by itself is a syntactically valid SemVer string

Nope it doesn't. Semver needs at least three parts major.minor.patch. Having only two is not valid. I'm currently removing the semver from the plugins again and try to come up with some backwards and forward compatible string magic -.-

vincenzopalazzo commented 1 year ago

I'm currently removing the semver from the plugins again and try to come up with some backwards and forward compatible string magic -.-

The pythonic solution could be to catch the exceptions and check the the fix two digits are >= 22, and we keep this monkey patch for 6 months like a deprecated feature lifecycle

rustyrussell commented 1 year ago

Database upgrades have been a pain point in the past, since they're hard to test (I only have one ancient db to treat on and it's the one on my node). Testing downgrades is likely to be even more cursory, so we could really screw things up!

We can't reverse adding columns (a common upgrade) when there are references to the table, for example. Thanks sqlite3! Leaving them around on downgrade is possible, but then breaks when you try to upgrade again.

Generally, we start using the new db fields immediately. So now we need another gate on "is this reversible?" for each of these cases. It's not clear to me in practice how often you would be unable to downgrade.

Unfortunately, it only takes one if these "downgrade stoppers" per release to make the entire effort pointless :(

ElementsProject / lightning

Version 22.11rc1 - 2022-11-10 has a weird version number #5716