Improve contributing guidelines

jak-pan commented 3 years ago

We are nearing the node's public release and getting more and more contributions, for which we are extremely grateful. ❤️

This means that we're on the right track to building a truly decentralized protocol.

This also means that:

[x] we need to create a contributing system that new people can understand very easily.
[x] we need to make sure that changes done by any contributor are understandable for everybody.
[ ] we will need to make sure that the releases are correctly tagged
[ ] we will need to know if we need to do migrations
[ ] have proper changelogs for node runners

Unfortunately, there is no one silver bullet that can do this AFAIK for our use-case. I tried to combine nuances of conventional-commits, SemVer, cargo workspaces, git tags, and various tooling that combine these things to form a proper, easy to follow system to achieve these goals.

In the perfect world, the ~solution~ to this would be to use https://github.com/conventional-commits to create a standard commit message for every new pull request ~(squashed)~. This would allow us to:

lint commit messages using workflows
have nice commit history that is understandable
know when to do migrations
generate proper changelogs
generate (some) appropriate tags

Unfortunately, conventional-commits don't fully support a monorepo environment like ours: https://github.com/conventional-commits/conventionalcommits.org/issues/269 nor commits tackling multiple scopes: https://github.com/conventional-commits/conventionalcommits.org/issues/302

However, they significantly increase commit readability, and it is also possible to generate changelogs when we use them. The good news is that we don't need to adhere to this throughout development.

Thanks to tools like: https://github.com/zeke/semantic-pull-requests or https://github.com/amannn/action-semantic-pull-request

We just need to make sure we have one point where we provide that information. It can be a PR title or a single commit in PR. In this case, all PR commits will need to be ~squash~ merged to master. This will be done by core team members that will ensure the proper commit message is selected when ~squash~ merging. Or we can find a tool (merge bot) that will extract this information for us and do the merge.

For actual commit message creation help, there are multiple tools out there, but I have a good feeling about this python version: https://pypi.org/project/commitizen/

As for the SemVer, I've tried to use https://github.com/rust-lang/rust-semverver to automatically parse and bump Cargo.toml using SemVer. Unfortunately, our repository looks like too big of a challenge for it right now. We could find some tool that will enforce bumping a version of (at least) nearest Cargo.toml if *.rs file has been changed.

We will need to define rules by which versions should be bumped, especially in the HydraDX-node repository, as breaking change for a pallet doesn't mean that the whole repo or runtime should have a new version. Instead, we should definitely check if that change requires storage migration and runtime upgrade or not. This could be auto-checked by a workflow where node spins up and try to join existing testnet.

When we have all of this, we can use a tool that will build changelogs: https://crates.io/crates/convco and subsequently, publish releases for us: https://github.com/paritytech/cargo-unleash

As for the contributing guidelines and examples of how such a thing can work, I've set up a new repository here. https://github.com/galacticcouncil/guidelines-example

Please try to open an issue, commit stuff (possibly with conventional commits), open a PR and play with it a little, and LMK what is missing and what should be changed. I would like to push this to all of our repos.

If you have any other ideas or suggestions or have better solutions to these, please join the discussion! We'd like to build an environment where it is a joy to develop, not enforce some arbitrary rules that nobody wants to follow.

Outstanding:

[ ] Merge bot that understands conventional commits
[ ] Changelog build
[ ] Workflow for helping with SemVer
[ ] Workflow for figuring out breaking changes for Runtime
[ ] Tagging

EDIT: Removed squash merge requirement as it will lead to wrong attribution and unclear blame. As the semantic-pull-request app can select either a commit message or merge commit name, we only need one of these to be present. And we can also do more complicated refactors with multiple commits if we want them to land in changelog.

RoboRambo commented 3 years ago

As for the versioning:

While I agree with the semVer reasoning for when to update versions, I do think that versioning using integers seperated by dots only also adds a layer of indirection. Often enough one would have to check changelogs when trying to find out which versions are compatible or when which feature was added.

Think of it as using coordinates vs using adresses.

While I havent thought through all the details, Calendar Versioning might provide a way to have versions which are more expressive.

OpenSCAD for example has versions like "2021.01", but one could also have a "2021.06-mainnet" version if wanted.

Maybe this is worth considering, as we are AFAIK a time-sensitive project. We do have milestones with due dates.

jak-pan commented 3 years ago

@RoboRambo

I was reading on CalVer as I've not been very familiar with it before (other than Ubuntu). It looks like a good idea for projects that want to be backwards compatible and state the intent for compatibility in time like APIs.

It however looks like the very opposite of your argument here is true.

You don't see that something has a breaking change when looking at CalVer. SemVer was made exactly for this reason. I would argue for using it, as we will have breaking changes and we definitely want to show it to other developers by using version numbers, so that they can safely use all the tooling that is already present in all standard package managers: https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html

Rust itself follows SemVer.

I just don't see any benefit to this here.

We'll definitely tag releases with dates tho.

RoboRambo commented 3 years ago

I do see your point when one is interested in breaking changes (what are these?), as CalVer could introduce a layer of indirection. (technically one could tag versions for breaking changes, however this is not recommended)

Short: Happy to adhere to SemVer specifications/suggestions. :)

Long: However: Semver states

But what if I leave the API as-is and rewrite everything thats underneath it? I'd increment MINOR and one could think its an easy upgrade. If your code interoperates closely with another component, you might have to diff the changes like donald stufft before upgrading.

My opinion in general (I do understand that I'm outnumbered): As one cannot rely on looking at MAJOR to find "changes which can break things", when versioning one should not try to give people that sense of security. I have to agree with Mahmoud Hashemi and I have to ask myself: what is a specification worth if a project cannot be tested for adhering to that specification?

I'm not saying CalVer is the solution and or that Calver should be used here, though. SemVer has been used widely and successfully often enough.

Good Reads: Versioning discussion on ycombinator Donald stuff's take on versioning Mahmoud Hashemi's take on versioning 0ver

jak-pan commented 3 years ago

But what if I leave the API as-is and rewrite everything thats underneath it? I'd increment MINOR and one could think its an easy upgrade. If your code interoperates closely with another component, you might have to diff the changes like donald stufft before upgrading.

My opinion in general (I do understand that I'm outnumbered): As one cannot rely on looking at MAJOR to find "changes which can break things", when versioning one should not try to give people that sense of security. I have to agree with Mahmoud Hashemi and I have to ask myself: what is a specification worth if a project cannot be tested for adhering to that specification?

Oh ok! We're operating under wrong assumptions here.

I strongly disagree to the article you posted. I agree with some of those arguments by themselves but it assumes everybody uses SemVer wrong and builds an argument on that wrong assumption.

For sure... Yes. If you don't follow it, it will not work. Breaking change means that you make any changes to existing public API and this means API of any module or package, you must increase MAJOR version of that module. And this can be easily checked by testing.

If your package is well covered by tests and by introducing a change your original test breaks, you have made a breaking change. It's very simple. I'd go for that with the version of the node itself.

If by introducing new things it stops working and throw error until increasing spec_version, we have introduced a breaking change. I think that most of this can be checked inside of a workflow.

Of course we can make mistakes, but if we adhere to this we should be good and happy!

jak-pan commented 3 years ago

As we're nearing the release, I'd like to close this and setup some workflows that will make this easier. @RoboRambo you'll have to trust me on the SemVer of the node itself.

I'm still thinking about the releases though. Do we also want to follow runtime version?

I guess CalVer could be used but I'm not sure about mixing things up together. @martinfridrich @lumir-mrkva @enthusiastmartin @green-jay @Roznovjak any thoughts?

mrq1911 commented 3 years ago

semver makes the most sense for us as the rest of the ecosystem uses it and there is tooling supporting it. identifying breaking changes could be pretty tricky though as we should be always backward compatible and forward compatible most of the time as well, as the runtime execution itself can be always performed in wasm virtual environment and that's the part that changes the most. there will be only rarely breaking changes that would force you to update the node.

I'm glad that squash merges are not required as that always reduces information granularity about changes.

jak-pan commented 3 years ago

semver makes the most sense for us as the rest of the ecosystem uses it and there is tooling supporting it. identifying breaking changes could be pretty tricky though as we should be always backward compatible and forward compatible most of the time as well, as the runtime execution itself can be always performed in wasm virtual environment and that's the part that changes the most. there will be only rarely breaking changes that would force you to update the node.

I'm glad that squash merges are not required as that always reduces information granularity about changes.

Actually it is possible to test this. If you don't touch spec_version and introduce breaking change node will not be able to continue working this will be an indication of migration required and requirement for bumping spec_version and also runtime version.

enthusiastmartin commented 3 years ago

semver makes most sense for me too.

As for the commit messages - i would go as far that each commit has to follow the conventional commit.

Regarding squashing - not mandatory but sometimes it makes sense - so use common sense.

jak-pan commented 3 years ago

semver makes most sense for me too.

As for the commit messages - i would go as far that each commit has to follow the conventional commit.

Regarding squashing - not mandatory but sometimes it makes sense - so use common sense.

I don't want to make it hard for new contributors. If there is more important stuff in a PR you can do multiple conventional commits to show up in changelog but I don't think it's neccessary to have every commit there.

jak-pan commented 3 years ago

Maybe we'll also need this https://colineberhardt.github.io/cla-bot/

galacticcouncil / hydration-node

Improve contributing guidelines #121