Not needing forks to "hunterify" packages

bkotzz commented 4 years ago

Current state: When adding a new package to hunter, many times we need to fork that package to make modifications to it. These commonly are:

The package does not use CMake
The package use CMake, but forces certain build settings (ie. sets the CXX version to 11), when we want to leave these up to the toolchain
The package has dependencies that need to be installed through hunter (ie. hunter_add_package in addition to the usual find_package)
The package doesn't have a CMake install step, which hunter uses to make the package available to consumers
The package does not have a config file

Needing to make forks to add a package creates an extra overhead that would be nice to avoid. However, we probably can't get away with needing to make changes to packages to use them, because we rely on "CMake best practices" that upstream maintainers either haven't added or aren't interested in adding.

A more lightweight version:

Here is an example of a PR to a fork, to hunterize a package https://github.com/hunter-packages/Bento4/pull/1, at version 1.5.1-628-e6ee435-p0. Instead of making a fork, and pushing a commit to it, we could generate a patch, with git diff HEAD~1 HEAD > 1.5.1-628-e6ee435-p0.patch. We could then take that patch, and when making the PR against hunter too add the package, like https://github.com/ruslo/hunter/pull/1797, we would include the patch file. So we would have cmake/projects/bento4/1.5.1-628-e6ee435-p0.patch. Then, our hunter config would look like this, using the upstream release tar:

 hunter_add_version(
    PACKAGE_NAME
    bento4
    VERSION
    "1.5.1-628-e6ee435-p0"
    URL
    "https://github.com/hunter-packages/Bento4/archive/v1.5.1-628-e6ee435-p0.tar.gz"
    SHA1
    f296f38004cd6523eed12ce15cbbfaa8ce6fa050
    LOCAL_PATCH
    ON
)

After downloading the source, and before building it, we would have an extra step:

> download https://github.com/hunter-packages/Bento4/archive/v1.5.1-628-e6ee435-p0.tar.gz
> extract v1.5.1-628-e6ee435-p0.tar.gz
> cd v1.5.1-628-e6ee435-p0
> git apply ${HUNTER_ROOT}/cmake/projects/bento4/1.5.1-628-e6ee435-p0.patch, if it exists

To support packages with multiple versions in hunter, we could keep multiple patch files in the project folder in hunter.

Benefits:

No more forks
Hunterizing a package, and adding it to hunter happen in one PR - the plain-text patch can be reviewed when adding it

Disadvantages:

We are then restricted to using released archive from upstream repos. Some repos don't release frequently, and the fork model has allowed us to use unreleased code in hunter
The size of the hunter repo increases. The example patch I referenced was very simple, and still is 3.6KB. Having more complex patches, and patches per version for all packages in hunter would easily add 10s of megabytes to the repo

Reviewing patches becomes visually harder, compared to a PR against the fork. For example, we need to read

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 2f08b2f..c7170c3 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,7 +1,7 @@
# cmake -DCMAKE_BUILD_TYPE=Debug ..
# cmake -DCMAKE_BUILD_TYPE=Release ..
-cmake_minimum_required(VERSION 2.8)
-project(bento4)
+cmake_minimum_required(VERSION 3.0.2)
+project(bento4 LANGUAGES CXX VERSION "0.0.0")

Instead of Likely we can support both the fork and patch models, and only use the patch models on small changes. However, small patches cover the overwhelming majority of package forks in hunter.

Summary:

Tracking versioned patches in hunter for each package that requires "hunterization" to use means we can remove the need to fork packages, and can quickly make small modifications to a project's source in a single PR

dan-42 commented 4 years ago

forking brings also another advantage, It allows shipping only the production code.

For Example Boost is stripped from docs, examples and tests. This reduces the size someone needs to download(original size boost 1.71.0 *.tar.gz is 115MB, stripped is 25 MB)

Another package where the size bugs me is nlohman_json. This is a single file header-only lib (~750KB), but hunter downloads the complete repository which contains a lot of tests and test data (*113MB as .tar.gz**)

bkotzz commented 4 years ago

It’s a good point - thanks for raising this. I’m in favour of keeping both mechanisms, and using patches for the simple hunterizations, and forking for the more surgical ones.

caseymcc commented 4 years ago

I am not sure the patch buys us anything. You still have to keep history of the patch (in essence a repo) in order for new package updaters to understand what was needed to support hunter on the previous versions. You still have to store it somewhere (likely in a repo) as storing it in hunter would bloat it. For this reason I think fork is the better tool as the history is there, the ability to branch each release allows better for incremental updates -p0, -p1, etc... Personally I don't understand the fear of forking, if I was paying for storage or something like that I might be worried but otherwise I find it convenient.

bkotzz commented 4 years ago

@caseymcc

You still have to store it somewhere

I think if we only used this approach for packages that require minimal changes, like the one I referenced in the Bento4 example above, I think it could be stored in the hunter repo itself, and would not cause bloat. Most of the packages in hunter required very small changes, which would equate to a patch of a few kilobytes. Even if we did that for a few hundred packages and a few versions per package and it added a few MBs to hunter I don’t think that would cause many problems with download times or disk usage.

Personally I don't understand the fear of forking

I think people have different opinions on this. People who contribute a lot probably don’t care. But I think there are many more people who avoid contributing because they find the process to be laborious. Instead of just making a PR against the hunter repo to add a package, you have to make an issue requesting a fork, wait for a maintainer to make you a fork, a maintainer has to make it, then you have to fork the fork and start your work and put up a PR and wait for a maintainer to review, then a maintainer has to approve and release, then you start your branch against hunter, and run your tests, and hopefully your tests pass or else you have to go back and make another PR against the fork, but suppose they do pass, you now have to wait for another review, and a maintainer has to approve and release... I think people would be more willing to contribute if they could add or update a package in a single swoop, rather than this series of steps that can be drawn out and confusing to newcomers

caseymcc commented 4 years ago

@bkotzz

I think once you are at the level of being required to make a patch file to include a package you are already at the level of dealing with forks. As for the process the waiting does not need to happen, maybe we should update the docs if it is not stated but you can immediately fork the project to your user account, make changes, update one of your branches of hunter and test without ever engaging a maintainer. The only interfacing with a maintainer that is needed is when you want it all in the official repos.

With the Lua package I put in the issues, I made the request for the repo but I did all of the above in parallel and had it tested and almost ready before you responded that the repo was created. Now all I did was push the changes to the new repo and now I am modifying my original hunter branch to the new repo release and will put in the PR for hunter shortly.

nasosi commented 4 years ago

I think having the patching capability is a good idea. It can also ease the development of subsequent version patches, since the revision intent becomes very clear. I use this approach with cmake externalproject_add and I find it the most appropriate one. It also reduces concerns about the invasive nature of hunterization that some people may be sensitive about.

If the size of Hunter repo is of concern, then maybe the patch file(s) can live on a separate project on Hunter packages.

geiseri commented 4 years ago

I guess I do not see the issue with patches as debian/open embedded/redhat have been doing it for decades. They do have very established workflows to generate these patches. I do admit these patches look ugly, but there are things like patchwork that make these a bit easier to review. Right now my biggest issue with hunter vs vcpkg is that the workflow is so much more overhead for hunter that I really do not want to bother. If you want more developer contributions I think you might want to examine something that has a lower barrier to entry.

cpp-pm / hunter

Not needing forks to "hunterify" packages #28