Replace ``commit_hash.txt`` with git archive placeholders (and try also replacing ``prerelease.txt``)

ekpyron commented 4 years ago

The github source archives are invalid in that they are lacking proper commit_hash.txt and prerelease.txt files and we instead have to manually pack another source archive, which is confusing and error-prone.

The github source archive is created with git archive and https://git-scm.com/docs/gitattributes#_export_subst would allow us to create a commit_hash.txt file with one of the placeholders like https://git-scm.com/docs/git-log#Documentation/git-log.txt-emHem

Then our build scripts could check if the commit_hash.txt contains that placeholder and assume it's a git checkout and determine the hash using git and otherwise assume it's a release archive and that commit_hash.txt contains the proper commit hash.

prerelease.txt could be replaced for example by inspecting the top of the changelog - or maybe https://git-scm.com/docs/git-log#Documentation/git-log.txt-emdem can be used to check for tags in the decorators.

cameel commented 2 years ago

I think we should also document the mechanism in the docs as a part of this task. It will become less necessary to do this manually but the mechanism is still not obvious and we should have a place with an explanation to point people to.

cameel commented 2 years ago

The script that currently generates these files (and the source tarball) is create_source_tarball.sh.

cameel commented 2 years ago

Apparently there's an additional use case not mentioned in the issue description:

@ekpyron

It would be nice if checking out a release tag via git would build a release without user interaction.

Do we want that as a part of this too? I assumed it's only about github's source packages and that we're just going to keep the current system outside of that usage. If we do want prerelease.txt and commit_hash.txt to be automatically populated we probably should use git hooks for that.

cameel commented 2 years ago

Here's what I think we should do for this issue: 1) Update our scripts and cmake config to treat the file with a placeholder the same way as if the file did not exist at all. Do not try to get the right value automatically when the file is being read and do not fill out the template automatically outside of git archive.

This way anyone just checking out the repo and building without touching these files will still get the same behavior as now (i.e. always a pre-release build). 2) Provide a script that gets the right values and fills out the templates.
Users can easily run it to get a release build after checking out a tag. 3) (Optional) Provide a post-checkout hook so that it's easy to have your repo configured for git to run the script automatically on every checkout. The thing with hooks is that they are not installed automatically when you clone a repo. So I'd do it like this:
Create the hook in scripts/git-hooks/post-checkout. The hook should simply call the script from (2).
Create a script called scripts/install_git_hooks.sh. It would copy this hook (and any other hooks we decide to add in the future) to the hook dir (i.e. the dir returned by git rev-parse --git-path hooks).

ekpyron commented 2 years ago

So after extensive discussion with @cameel I think we arrived at the following conclusion regarding the prerelease logic:

We create a prerelease.txt file in the repository, but not in the repository root. I suggest cmake/prerelease.lock. The contents can be something like If this file is present, a prerelease will be built by default. You can override this with cmake options ...<mention cmake options below>...
We use export-ignore, s.t. this file will not be part of the github source archive.
If cmake/prerelease.lock exists, we build a prerelease by default, so cmake/scripts/buildinfo.cmake sets SOL_VERSION_PRERELEASE to develop.<date>.
If cmake/prerelease.lock does not exist, we build a release by default, so cmake/scripts/buildinfo.cmake sets SOL_VERSION_PRERELEASE to empty.
If an old-style prerelease.txt file exists in the repository root, we raise a fatal cmake error, explaining the new mechanism.

Additionally, we want to provide cmake options to override this default behaviour:

We provide a new cmake string option SOL_PRERELEASE_STRING which can be used to override the prerelease string, i.e. the value of SOL_VERSION_PRERELEASE in cmake/scripts/buildinfo.cmake (i.e. if SOL_PRERELEASE_STRING is non-empty, we use its content as SOL_VERSION_PRERELEASE)
We provide a new cmake boolean option SOL_FORCE_RELEASE to force a release build (empty SOL_VERSION_PRERELEASE in cmake/scripts/buildinfo.cmake even if cmake/prerelease.txt exists). If this option is true, SOL_PRERELEASE_STRING needs to be unset/empty.
We use the cmake string option SOL_COMMIT_HASH to replace the commit_hash.txt file we've been using so far. If SOL_COMMIT_HASH is non-empty, we use its content and ignore the file, otherwise, if the file exists and actually contains a commit hash, we use that hash. Otherwise we set it to the default value.

When it's done:

Document the new cmake options under Building from Source. Also include a note explaining how the mechanism worked until now.
Remember to update CI and scripts to use the new mechanism.

EDIT by @cameel: Updated with my suggestions from the chat. EDIT by @cameel: Renamed all options to use SOL_ prefix (rather than SOLC_). I noticed that the original description was mixing these two and having both SOL_COMMIT_HASHand SOLC_COMMIT_HASH at the same time is just asking for bugs. EDIT by @cameel: Changed the part about commit_hash.txt.

ekpyron commented 2 years ago

The commit_hash.txt logic can stay as already done in the PR https://github.com/ethereum/solidity/pull/12717. (This is: the file is checked in with a pattern, which is replaced by the hash on git archive. If the file contains that pattern, we use git to determine SOL_COMMIT_HASH, otherwise we use the file contents) If we want we can additionally add a cmake option to override it as well.

ekpyron commented 2 years ago

Note that we will also need to check all release-related scripts and change them for towards the new mechanism. scripts/release_ppa.sh for example.

cameel commented 2 years ago

We also need to decide if we want to drop the old source archive and make the script just a wrapper over git archive. @ekpyron just noticed that they include some extra stuff like the jsoncpp sources. I think we could drop them anyway because they don't include other deps like ranges-v3 and are thus not self-contained but we'll need to discuss that with @chriseth.

ekpyron commented 2 years ago

If we're not fine with not having dependencies in the source archive, we can drop the entire idea, because there's no way to achieve that with git archive, resp. in github source archives :-). But I'd also say that it's a non-issue - and the fact that nobody ever noticed that range-v3 is not packed in there is proof of that.

cameel commented 2 years ago

Even if we don't drop the other archive I think there's still some value in the default one being somewhat usable.

chriseth commented 2 years ago

I'm fine with removing the jsoncpp source

aarlt commented 2 years ago

I'm somehow wondering whether we could achieve this without having any magic files like cmake/prerelease.lock. Wouldn't it be possible to have all this just managed by CMake?

aarlt commented 2 years ago

In general I think that there must be an easy way to use those source tarballs. That means all dependencies should be either included (I think it's not possible, e.g. to always include all boost stuff), or a t least listed in a format, so that an enduser can automatically install them. Maybe we could use something like vcpkg to manage those dependencies better.

cameel commented 2 years ago

I'm somehow wondering whether we could achieve this without having any magic files like cmake/prerelease.lock. Wouldn't it be possible to have all this just managed by CMake?

Unfortunately the automatically created source archive is not usable without this mechanism. Having this file is the only way we came up with so far to have the source result in a release build automatically. This is because git archive used by Github lets us skip files from it but not add them.

That means all dependencies should be either included (I think it's not possible, e.g. to always include all boost stuff), or a t least listed in a format, so that an enduser can automatically install them.

If they're missing, they get downloaded automatically by cmake. See jsoncpp.cmake and range-v3.cmake. Which is exactly why no one noticed so far that ranges-v3 was not included in the old archive.

Maybe we could use something like vcpkg to manage those dependencies better.

There's an ongoing discussion on this: https://github.com/ethereum/solidity/issues/8860#issuecomment-932787258. No consensus so far though. Last time I asked, @chriseth was not convinced that introducing such a system for managing dependencies is an improvement over just keeping dependencies to a minimum. And others have varied opinions.

aarlt commented 2 years ago

If they're missing, they get downloaded automatically by cmake. See jsoncpp.cmake and range-v3.cmake. Which is exactly why no one noticed so far that ranges-v3 was not included in the old archive.

Ah that's true! Somehow I forgot that we have still our build system present in the tarball ;)

However, I think we don't really have boost managed, and also not z3, right? Both depend on the version installed on the system. I think it would be great to have those also managed by us. I think that would be a point for moving forward with something like vcpkg.

cameel commented 2 years ago

Right, boost is a major pain. Personally, I'd be fine with something like vcpkg as an option, as long as it integrates with our cmake system and is not required if you prefer to install dependencies in a different way.

cameel commented 2 years ago

We discussed the PR today on the chat. Here's a summary of what we agreed on:

Building a release must be automatic. Change the rules as follows:
- If .git directory exists and we're on a tagged commit, it's a release
- If .git directory exists and we're not on a tagged commit, it's a pre-release
- If .git directory does not exist and commit-hash.txt contains an actual commit hash, it's a release.
- Otherwise, report an error.
We do want the overrides (forcing a release build, setting pre-release string and overriding the commit) but they should be used sparingly. With the automatic mechanism they should rarely be needed even in our CI.
- To make the PR simpler we can drop SOL_FORCE_RELEASE and SOL_COMMIT_HASH options (only keep them as variables as they were until now). The magic files are enough.
- We still need SOL_PRERELEASE_STRING option for setting the value for the pre-release string.
  - It's an error to use it in a release build.
Source archives need to be tested. We should have at least one job that builds from the source archive. (#12900)

ekpyron commented 2 years ago

Note that .git can also be a file instead of a directory in git worktrees. Also in the release_ppa.sh script we will need to keep the .git directory for prerelease ppa builds.

ekpyron commented 2 years ago

Just in case it helps: I had some local snippets yesterday to see how to robustly check, if we're on a tag and ended up with this:

        execute_process(
                COMMAND git tag --points-at HEAD
                OUTPUT_VARIABLE SOL_TAGS OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
                COMMAND_ERROR_IS_FATAL ANY
        )
        string(REPLACE "\n" ";" SOL_TAGS "${SOL_TAGS}")
        list(FILTER SOL_TAGS INCLUDE REGEX "^v[0-9]+[.][0-9]+[.][0-9]+$")
        if(SOL_TAGS)
                message("Release tag!")
        else()
                message("Not a release tag!")
        endif()

Not sure that's the best way to do it, but maybe it helps as a basis.

ekpyron commented 1 year ago

We should also consider removing scripts/build.sh and adjusting the docs, when this is done.

ethereum / solidity

Replace ``commit_hash.txt`` with git archive placeholders (and try also replacing ``prerelease.txt``) #9720