cpp-pm / hunter

CMake driven cross-platform package manager for C/C++.
BSD 2-Clause "Simplified" License
600 stars 179 forks source link

Not needing forks to "hunterify" packages #28

Open bkotzz opened 4 years ago

bkotzz commented 4 years ago

Current state: When adding a new package to hunter, many times we need to fork that package to make modifications to it. These commonly are:

Needing to make forks to add a package creates an extra overhead that would be nice to avoid. However, we probably can't get away with needing to make changes to packages to use them, because we rely on "CMake best practices" that upstream maintainers either haven't added or aren't interested in adding.

A more lightweight version:

Here is an example of a PR to a fork, to hunterize a package https://github.com/hunter-packages/Bento4/pull/1, at version 1.5.1-628-e6ee435-p0. Instead of making a fork, and pushing a commit to it, we could generate a patch, with git diff HEAD~1 HEAD > 1.5.1-628-e6ee435-p0.patch. We could then take that patch, and when making the PR against hunter too add the package, like https://github.com/ruslo/hunter/pull/1797, we would include the patch file. So we would have cmake/projects/bento4/1.5.1-628-e6ee435-p0.patch. Then, our hunter config would look like this, using the upstream release tar:

 hunter_add_version(
    PACKAGE_NAME
    bento4
    VERSION
    "1.5.1-628-e6ee435-p0"
    URL
    "https://github.com/hunter-packages/Bento4/archive/v1.5.1-628-e6ee435-p0.tar.gz"
    SHA1
    f296f38004cd6523eed12ce15cbbfaa8ce6fa050
    LOCAL_PATCH
    ON
)

After downloading the source, and before building it, we would have an extra step:

> download https://github.com/hunter-packages/Bento4/archive/v1.5.1-628-e6ee435-p0.tar.gz
> extract v1.5.1-628-e6ee435-p0.tar.gz
> cd v1.5.1-628-e6ee435-p0
> git apply ${HUNTER_ROOT}/cmake/projects/bento4/1.5.1-628-e6ee435-p0.patch, if it exists

To support packages with multiple versions in hunter, we could keep multiple patch files in the project folder in hunter.

Benefits:

Disadvantages:

Summary:

dan-42 commented 4 years ago

forking brings also another advantage, It allows shipping only the production code.

For Example Boost is stripped from docs, examples and tests. This reduces the size someone needs to download(original size boost 1.71.0 *.tar.gz is 115MB, stripped is 25 MB)

Another package where the size bugs me is nlohman_json. This is a single file header-only lib (~750KB), but hunter downloads the complete repository which contains a lot of tests and test data (*113MB as .tar.gz**)

bkotzz commented 4 years ago

It’s a good point - thanks for raising this. I’m in favour of keeping both mechanisms, and using patches for the simple hunterizations, and forking for the more surgical ones.

caseymcc commented 4 years ago

I am not sure the patch buys us anything. You still have to keep history of the patch (in essence a repo) in order for new package updaters to understand what was needed to support hunter on the previous versions. You still have to store it somewhere (likely in a repo) as storing it in hunter would bloat it. For this reason I think fork is the better tool as the history is there, the ability to branch each release allows better for incremental updates -p0, -p1, etc... Personally I don't understand the fear of forking, if I was paying for storage or something like that I might be worried but otherwise I find it convenient.

bkotzz commented 4 years ago

@caseymcc

You still have to store it somewhere

I think if we only used this approach for packages that require minimal changes, like the one I referenced in the Bento4 example above, I think it could be stored in the hunter repo itself, and would not cause bloat. Most of the packages in hunter required very small changes, which would equate to a patch of a few kilobytes. Even if we did that for a few hundred packages and a few versions per package and it added a few MBs to hunter I don’t think that would cause many problems with download times or disk usage.

Personally I don't understand the fear of forking

I think people have different opinions on this. People who contribute a lot probably don’t care. But I think there are many more people who avoid contributing because they find the process to be laborious. Instead of just making a PR against the hunter repo to add a package, you have to make an issue requesting a fork, wait for a maintainer to make you a fork, a maintainer has to make it, then you have to fork the fork and start your work and put up a PR and wait for a maintainer to review, then a maintainer has to approve and release, then you start your branch against hunter, and run your tests, and hopefully your tests pass or else you have to go back and make another PR against the fork, but suppose they do pass, you now have to wait for another review, and a maintainer has to approve and release... I think people would be more willing to contribute if they could add or update a package in a single swoop, rather than this series of steps that can be drawn out and confusing to newcomers

caseymcc commented 4 years ago

@bkotzz

I think once you are at the level of being required to make a patch file to include a package you are already at the level of dealing with forks. As for the process the waiting does not need to happen, maybe we should update the docs if it is not stated but you can immediately fork the project to your user account, make changes, update one of your branches of hunter and test without ever engaging a maintainer. The only interfacing with a maintainer that is needed is when you want it all in the official repos.

With the Lua package I put in the issues, I made the request for the repo but I did all of the above in parallel and had it tested and almost ready before you responded that the repo was created. Now all I did was push the changes to the new repo and now I am modifying my original hunter branch to the new repo release and will put in the PR for hunter shortly.

nasosi commented 4 years ago

I think having the patching capability is a good idea. It can also ease the development of subsequent version patches, since the revision intent becomes very clear. I use this approach with cmake externalproject_add and I find it the most appropriate one. It also reduces concerns about the invasive nature of hunterization that some people may be sensitive about.

If the size of Hunter repo is of concern, then maybe the patch file(s) can live on a separate project on Hunter packages.

geiseri commented 4 years ago

I guess I do not see the issue with patches as debian/open embedded/redhat have been doing it for decades. They do have very established workflows to generate these patches. I do admit these patches look ugly, but there are things like patchwork that make these a bit easier to review. Right now my biggest issue with hunter vs vcpkg is that the workflow is so much more overhead for hunter that I really do not want to bother. If you want more developer contributions I think you might want to examine something that has a lower barrier to entry.