boostorg / release-tools

5 stars 24 forks source link

[WIP] Archive variants #52

Open alandefreitas opened 10 months ago

alandefreitas commented 10 months ago

Add extra archive variants such as docs-only and source-only. These variants can reduce expenses with JFrog download bandwidth, provide users with archives that are simpler to use, and provide docs-only archives for the website.

The MakeBoostDistro.py script includes parameters to determine what types of files should be included in the distribution. All other functions are adapted to handle these requirements accordingly.

fix #50

alandefreitas commented 8 months ago

@sdarwin Should we start iterating on this again?

sdarwin commented 8 months ago

For each of the new archives (docs-only and source-only) run a recursive diff diff -r dir1/ dir2/ comparing the new results to a traditional archive such as the ones on https://boostorg.jfrog.io/artifactory/main/develop/

sdarwin commented 5 months ago

The really interesting variant is source-only.

In terms of the website, I don't believe it's worth the complexity to have a docs-only bundle, because cloud storage is not expensive, a few dollars, and by continuing to use the "full" archives it provides redundancy and simplicity. The web files are full backup copies of each releases.

Docs-only is around 30% smaller. However, generating and uploading packages with both FULL and DOCS-ONLY (two packages instead of one) increases the total storage size! That's worse, not better. It also increases the amount of code to debug and maintain. Almost nobody would need to download a docs-only bundle, and if they did, the full archive serves the purpose. I propose commenting out the functionality of docs-only. Don't generate such an archive. Otherwise, make an argument for why docs-only should be kept.

In terms of source-only, how about these other choices.

or

Consider other folders such as test/ and example/. In those cases, "code" remains, but anything that isn't code such as txt, tar, json, README files are gone. That means the examples and the tests are probably 50% broken. Who is going to try to use examples/ or tests/ when files are missing?

Leaving things in a half-way useless state isn't helpful. At that point, why not go even further. Either have all tests/, or no tests/. But not broken tests where some critical .json files are missing so the tests won't run.

Another option is what Peter has started doing in Github Releases. boost-1.85.0.beta1-b2-nodocs.tar.gz. The -nodocs archive probably has all files without anything removed. The only difference is "docs" haven't been generated, so it saves around 25MB of space. But there won't be any controversy about the contents of the archive since it contains everything.

What are viable options that could be published with minimal controversy or confusion to end-users. And continue to work as expected. A simple story/explanation and also be useful to the developer.

  1. Leave everything but don't "build" the docs. As in Peter's -nodocs. However, this is too easy somehow, and not a huge storage savings.

or

  1. A number of software projects out in the world keep their documentation in a separate git repository. That is clear enough. It's understandable. If we strip out all doc/ folders from all libraries, and the top level, but leave everything else intact, it doesn't break "tests", or "examples", or anything else. Delete all "docs/" folders. Nothing else is modified. The archive size should be quite small.

or

  1. If the boost/ folder is included, and basically nothing else. Maximum reduction. Consider that on Ubuntu, the libboost-all-dev package will install "boost". Examining the results of installing libboost-all-dev it will include /usr/include/boost/beast and /usr/include/boost/url but NOT anything from src/ such as url/src/segments_encoded_view.cpp. Therefore, the package corresponds to boost/ only.

or

  1. None of the above.

There could be a case to be made for a "source-only" (option 2) and a "minimal" (option 3), although having multiple choices adds complexity. If we agree about a strategy then I could send a message to the mailing list asking for their feedback. No rush, let's think about it.

alandefreitas commented 5 months ago

I don't believe it's worth the complexity to have a docs-only bundle

I agree. I think Peter asked for it after I did the source-only.

Just out of curiosity, to learn, what would the effect be of going even further, and removing the entire libs/ and tools/ directories also. Only leaving the boost/ directory. Does that break everything?

Users always need b2 from tools. Libs contains the source files so we also need it.

Within each library, completely remove the doc/ folder. Currently it is being selective, and inside the doc/ folder, leaving so-called "code" such as a Jamfile or a .hpp file, but removing quickbook, html, odg, images. Of what use is the remaining Jamfile? Who would ever need that for anything? When quickbooks and images are gone. At that point, a few remaining stray files are useless.

Mmmm... IIRC, I think it does that because Jamfiles outside doc refer to this doc file and things break. On the other hand, I think I did remove the tests from the release somehow (I don't remember if that's still in the source-only variant).

The reason I wanted to remove test was because it was an extreme case. One or two libraries contain tests that take most of the space in the whole release. I'll look at the release again with something like wiztree.

Consider other folders such as test/ and example/. In those cases, "code" remains, but anything that isn't code such as txt, tar, json, README files are gone. That means the examples and the tests are probably 50% broken. Who is going to try to use examples/ or tests/ when files are missing?

I think the main problem here was Jamfiles referencing these folders which breaks the build process when they don't exist. I think I'll try to come up with a script that works directly on top of the release from the website to filter these files. Then we can experiment more easily with it.

it saves around 25MB

Yes. That's nice but I was looking for something more extreme. Like the complete thing being around 20MB. That would be nice even in CI because you could just download everything instead of going through depinst.py.

What are viable options that could be published with minimal controversy or confusion to end-users

Yes. The steps you proposed are a good idea. I'll do some more experiments locally. I can work on the filters then try to build everything with b2 and keep doing that to ensure nothing breaks.