0xPolygonMiden / miden-node

Reference implementation of the node for the Polygon Miden rollup
MIT License
52 stars 37 forks source link

feat: improve `--version` for non-releases #495

Closed Mirko-von-Leipzig closed 2 weeks ago

Mirko-von-Leipzig commented 2 weeks ago

This PR improves --version outputs for the node and faucet by including extra information for non-release builds.

--version is now set to git describe --tags --dirty which is no change for formal releases (since tag == package version), but for non-releases indicates the previous tag, commit count since that tag, commit SHA and appends -dirty if it was built with uncommitted changes.

Motivation is given in #493.

Some points of contention:

  1. I added this to the miden-node-utils crate. However, since this is a build.rs that depends on git data, this means the utils crate will be rebuilt after every git change i.e. this will cause a rebuild of utils and all of its dependents. It is probably better if this became its own top level crate - I decided not to do this because I didn't want to publish such a minor crate. But it might make sense to just do it. Alternatively, we can add this build.rs to every binary we create instead.
  2. The tag found is sensitive. As an example, git describe --tags on the next branch currently returns v0.4.0 because the v0.5.0 tag on main has not been merged back yet. Just something to consider for our release "process" :). This differs significantly to the current output which is v0.6.0 (the in-progress package version).

The --dirty flag currently doesn't work unfortunately. I've opened an issue for this in vergen.

We can also consider additional build descriptions. vergen has an illustrative example for verbose version info, which provides information that could be useful for debugging/reproduction purposes:

~/p/r/app λ app -vv
app 0.1.0

Build Timestamp:     2021-02-23T20:14:46.558472672+00:00
rustc Version:       1.52.0-nightly
rustc Channel:       nightly
rustc Host Triple:   x86_64-unknown-linux-gnu
rustc Commit SHA:    3f5aee2d5241139d808f4fdece0026603489afd1
cargo Target Triple: x86_64-unknown-linux-musl
cargo Profile:       release

Closes #493

Mirko-von-Leipzig commented 2 weeks ago

Can also consider creating a status page for the node. Something to monitor the status of the node and the metadata like block height, node version etc.

Right now we can only "monitor" via the client or submitting a note via the faucet webpage.

bobbinth commented 2 weeks ago

I'm not fully understanding this yet - but a couple of questions/comments:

The effect of this is that we add more info to the --version command, right? And this info now going be based on Git tags rather than Cargo.toml? Or is it going to combine the two somehow?

Generally, I think it is good to add more details to the version like you have with the vargen example, but I think what gets printed out should be consistent with crates.io version. That is, if Cargo.toml says 0.6.1 - and then we can add more info to it.

I added this to the miden-node-utils crate. However, since this is a build.rs that depends on git data, this means the utils crate will be rebuilt after every git change i.e. this will cause a rebuild of utils and all of its dependents. It is probably better if this became its own top level crate - I decided not to do this because I didn't want to publish such a minor crate. But it might make sense to just do it. Alternatively, we can add this build.rs to every binary we create instead.

I also probably won't publish a separate crate just for this. Given that we have only 2 binaries, I'm wondering if putting this just in build.rs of the binaries is a better option?

Can also consider creating a status page for the node. Something to monitor the status of the node and the metadata like block height, node version etc.

Right now we can only "monitor" via the client or submitting a note via the faucet webpage.

Yes, that would be great. This is probably somewhat related to #145 and maybe #83 and #95. But maybe it is worth creating a separate issue (or maybe something to superseded some of these mentioned issues).

Mirko-von-Leipzig commented 2 weeks ago

I'm not fully understanding this yet - but a couple of questions/comments:

The effect of this is that we add more info to the --version command, right?

Correct, more specifically it uses the output of git describe --tags --dirty which is injected at compile time via the build.rs. The output format looks like this:

# If at a tagged commit, only the tag is output
version=TAG             # e.g. v0.5.0

# At commit with <SHA> hash, N commits after the last tag
version=TAG-N-SHA.      # e.g. v0.5.0-12-ABCDEFG

# In addition if the repo contains uncommitted changes then a `-dirty` suffix is added.
version=TAG-dirty.      # e.g. v0.5.0-dirty if directly on a tagged commit, or 
version=TAG-N-SHA-dirty # e.g. v0.5.0-12-ABCDEFG-dirty.

And this info now going be based on Git tags rather than Cargo.toml? Or is it going to combine the two somehow?

This is based entirely on git. If we publish v0.7.0 but never tag it then the --version output will be based on whatever the last available tag is plus the deltas. Tagging the release is considered best practice by both crates.io and github so a gap here would really indicate an issue on our end. However this can be rectified post-publish/release by just tagging it.

And for the binaries in which this was wrong, we would still have access to the correct commit SHA.

Generally, I think it is good to add more details to the version like you have with the vargen example, but I think what gets printed out should be consistent with crates.io version. That is, if Cargo.toml says 0.6.1 - and then we can add more info to it.

This should always be the case unless we screw up the tags. Basing it on the manifest won't work for non-releases as the manifest is always bumped immediately after the previous release, meaning this would be "incorrect" for the duration of next.

If the potential error between manifest and git tag is an issue, then we can leave things as is and instead add this information as extended metadata for the long version output. Non-releases will have the top level version output, but the correct data will still be present for long version. Though I would argue that it's confusing :)

I also probably won't publish a separate crate just for this. Given that we have only 2 binaries, I'm wondering if putting this just in build.rs of the binaries is a better option?

I think that's probably the correct call. Will update first thing next week.

Mirko-von-Leipzig commented 2 weeks ago

I should also add a tag filter so that only version related tags are considered. e.g. if we ever use tags for other things then those will be ignored.

bobbinth commented 2 weeks ago

This is based entirely on git. If we publish v0.7.0 but never tag it then the --version output will be based on whatever the last available tag is plus the deltas. Tagging the release is considered best practice by both crates.io and github so a gap here would really indicate an issue on our end. However this can be rectified post-publish/release by just tagging it.

Currently, my process is a bit different - though, it can of course be changes. Specifically, I usually first publish crates to crates.io and then tag the release.

Also, sometimes I publish patches of individual crates without creating tags. For example, if there was a bug fixed in say the block producer crate, I would publish it under v0.6.1 but keep the rest of the crates at v0.6.0 and the tag would remain at v0.0.6.0. Maybe this is not a good practice though.

If the potential error between manifest and git tag is an issue, then we can leave things as is and instead add this information as extended metadata for the long version output. Non-releases will have the top level version output, but the correct data will still be present for long version. Though I would argue that it's confusing :)

To clarify, would the short version show something like:

miden-node 0.6.1

But the long version would show something like:

miden-node 0.6.1

Build Timestamp:     2021-02-23T20:14:46.558472672+00:00
rustc Version:       1.52.0-nightly
rustc Channel:       nightly
rustc Host Triple:   x86_64-unknown-linux-gnu
rustc Commit SHA:    3f5aee2d5241139d808f4fdece0026603489afd1
cargo Target Triple: x86_64-unknown-linux-musl
cargo Profile:       release

If so, I don't think this is too confusing. But would it satisfy the original goal of this PR?

Mirko-von-Leipzig commented 2 weeks ago

Currently, my process is a bit different - though, it can of course be changes. Specifically, I usually first publish crates to crates.io and then tag the release.

Tagging afterwards is fine - crates.io doesn't actually store any binaries; so each subsequent install will checkout the repo and acquire the tag. There would be a small window prior to tagging where users might get this without the tag.

Also, sometimes I publish patches of individual crates without creating tags. For example, if there was a bug fixed in say the block producer crate, I would publish it under v0.6.1 but keep the rest of the crates at v0.6.0 and the tag would remain at v0.0.6.0. Maybe this is not a good practice though.

This version output only applies to the executables. So long as the tags match the intended versions of the node/faucet we would be okay. There are other options - one can also override these env variables during the build process if required.

I've reverted this change for now; I've only added additional metadata to the long version. The actual version used is now the crate/manifest version. The metadata will give us the additional data we need to accurately debug. We just need to be aware that a user claiming x.y.z version might be referring either to the release version, or the in-progress version.

To clarify, would the short version show something like:

miden-node 0.6.1

But the long version would show something like:

miden-node 0.6.1

Build Timestamp:     2021-02-23T20:14:46.558472672+00:00
rustc Version:       1.52.0-nightly
rustc Channel:       nightly
rustc Host Triple:   x86_64-unknown-linux-gnu
rustc Commit SHA:    3f5aee2d5241139d808f4fdece0026603489afd1
cargo Target Triple: x86_64-unknown-linux-musl
cargo Profile:       release

If so, I don't think this is too confusing. But would it satisfy the original goal of this PR?

I've updated the PR so that the short version output is:

$ miden-node -V

miden-node 0.6.0

and the long

$ miden-node --version

miden-node 0.6.0

SHA:          8d9602c-dirty
branch:       mirko-versionify
features:     testing
rust version: 1.80.1
target arch:  aarch64-apple-darwin
host arch:    aarch64-apple-darwin
opt-level:    3
debug:        false

Feel free to suggest a better formatting; I tried with prefixes like git.XXX and build.YYY or a more toml-like format but in the end I don't care that much :).

I've only implemented this for the miden-node binary for now; I'll copy it to the faucet once we have consensus.

Something else I was hoping to determine that it was built with --locked but unfortunately this isn't possible.

bobbinth commented 2 weeks ago

I guess one thing to confirm: how would it work when the crate is pulled from crates.io? For example, how would SHA and branch fields be set in this case?

Mirko-von-Leipzig commented 2 weeks ago

I guess one thing to confirm: how would it work when the crate is pulled from crates.io? For example, how would SHA and branch fields be set in this case?

I checked using cargo publish --dry-run ... and the git data is indeed just gone. For the current implementation it prints:

/miden-node --version`
miden-node 0.6.0

SHA:          VERGEN_IDEMPOTENT_OUTPUT
branch:       VERGEN_IDEMPOTENT_OUTPUT
features:     None
rust version: 1.81.0
target arch:  aarch64-apple-darwin
host arch:    aarch64-apple-darwin
opt-level:    0
debug:        true

which isn't ideal but I believe this should let us determine whether its a formal release/install at least? There are of course hacks around this if people want to abuse it but that is what it is.

bobbinth commented 2 weeks ago

which isn't ideal but I believe this should let us determine whether its a formal release/install at least? There are of course hacks around this if people want to abuse it but that is what it is.

Should we detect this and not include the missing fields in the output? Would still make it clear if this is a release/install, but also would avoid exposing implementation details (i.e., that we use vergen).

Mirko-von-Leipzig commented 2 weeks ago

which isn't ideal but I believe this should let us determine whether its a formal release/install at least? There are of course hacks around this if people want to abuse it but that is what it is.

Should we detect this and not include the missing fields in the output? Would still make it clear if this is a release/install, but also would avoid exposing implementation details (i.e., that we use vergen).

We can get at least the SHA from a published crate. cargo publish adds a .cargo_vcs_info.json file (except when using --allow-dirty), which we can detect and use instead:

{
  "git": {
    "sha1": "9d48046e9654d93a86212e77d6c92f14c95de44b"
  },
  "path_in_vcs": "bin/node"
}

I'm also taking a quick look into alternatives like built but maybe we don't care about this so much so just replacing the invalid string is enough.

Mirko-von-Leipzig commented 2 weeks ago

I've added a check for the published state (so we still get sha data) and also override the default vergen output in case it somehow slips through.

I've copy pasted this to the faucet instead of the utils crate for the reason given in this comment.

I've tested that it works for cargo publish locally, though I only tested miden-node and not the faucet. Testing is a bit of a pita because publish/package requires all dependencies to have a version specified which of course we don't (all point to next for miden-base):

error: all dependencies must have a version specified when packaging.
Mirko-von-Leipzig commented 2 weeks ago

I've moved everything into the utils crate by:

-- edit --

Okay this doesn't work because utils is compiled first - which means the environment variables aren't present yet, so they're always empty. Back to the drawing board.

-- 2nd edit --

Using your suggestion of a struct + Display implementation so the end caller can inject the variables.