GitoxideLabs / gitoxide

An idiomatic, lean, fast & safe pure Rust implementation of Git
Apache License 2.0
8.91k stars 303 forks source link

Unclear intent regarding optimizations for some profiles #1477

Closed EliahKagan closed 2 months ago

EliahKagan commented 2 months ago

Current behavior 😯

There are a few conditions related to build profiles for gitoxide defined in the top-level Cargo.toml that seem like they can be addressed when—as planned in the comment discussion in #1475—splitting the release profile into separate release and prof (or profile or profiling) profiles.

  1. [ ] Debug symbols - as discussed.
  2. [ ] Unclear intent for LTO - #lto = "fat".
  3. [ ] Old plan for optimizations in debug builds.

1. Debug symbols

Currently the release profile leaves debug symbols in because they are useful for profiling, but this creates the need to strip them in a separate step when they are not wanted, such as when building binaries to publish in releases on GitHub.

This is as discussed in comments in #1475, with a suggested plan in https://github.com/Byron/gitoxide/pull/1475#issuecomment-2256168338.

2. Unclear intent for LTO

It is not clear if link-time optimization, at higher than the default very limited level, should be done. It's possible this might even differ between release and prof profiles. (This is the main sub-issue that may need examination and discussion, and the main reason I opened this issue.)

Early on, in 07859cb, various performance-related configuration was set for the release profile. This included the most aggressive link-time optimization available—short of LTO across different languages—which was still in place about a year ago:

https://github.com/Byron/gitoxide/blob/71efcbba112376b4acaf37d662cdb38d369462be/Cargo.toml#L185

Then in bd32e39 (#893) it was commented out, with the effect of changing the lto setting from fat to the default of false, which somewhat unintuitively still performs link-time optimization, just only within the crate itself.

https://github.com/Byron/gitoxide/blob/1d37bf6a773d56eea9003aa626ced413e8e0eaa3/Cargo.toml#L209

For the last year, it's been that way. Since the gitoxide project is made up of many crates, which the gitoxide binary crate itself uses, I suspect this does not leverage most of the benefit of LTO. That is, the word "false" may be intuitively accurate in this case.

Perhaps benchmarking was done and found LTO not to be helpful for gix and ein. (In other software that uses any gitoxide crates, that software would of course set whatever LTO-related configuration is wanted.) But the commit and PR do not state a reason for the change, and they are otherwise not conceptually related to LTO or optimization. If the goal is to decrease build times, then thin could be used, which is documented to be almost as good as fat while being faster to perform.

Another possibility is that LTO was turned off because the resulting optimizations decreased the detail available when profiling or made the results hard to interpret. If that was the reason, then I think that, when splitting release into release and prof, this would be resolved by setting different values of lto in the two different profiles. release could have fat or thin, while prof could stick with the default of false.

3. Old plan for optimizations even in debug builds?

This would not affect the release or prof configuration, but I suspect it could be clarified at the same time.

a5b1cd3 added:

https://github.com/Byron/gitoxide/blob/1d37bf6a773d56eea9003aa626ced413e8e0eaa3/Cargo.toml#L217-L219

It seems intuitively odd to compile dependencies with a higher optimization level than the code that is using them, especially since this may make stack traces less rich. On the other hand, maybe it is to make test runs faster.

We are already doing it, with the even more aggressive level of 3, in the dev profile for particular dependencies, most of which are even gix-* crates. This was done in 4a948d0 (after having been attempted in #190 per a22c95b and 85f1a48) with the explicit purpose of making some tests faster. I am not suggesting that this be changed, since its purpose is clear from the commit message and running tests faster has a number of benefits.

https://github.com/Byron/gitoxide/blob/1d37bf6a773d56eea9003aa626ced413e8e0eaa3/Cargo.toml#L196-L205

But one would expect that and the commented-out opt-level = 2 for * to be right next to each other. Instead, the release profile is written in between them. Although the commit message in a5b1cd3 makes clear that this was not why that commented-out code was introduced, I wonder if part of the reason it has been preserved as a comment for so long, rather than eventually being uncommented or removed, could be confusion with this other code in the release configuration:

https://github.com/Byron/gitoxide/blob/1d37bf6a773d56eea9003aa626ced413e8e0eaa3/Cargo.toml#L215

To be clear, I'm not at all advocating changing that, which was introduced in add2e3e (#2) and makes it so dependencies that are only used when building and not used when running the code are not needlessly optimized.

When adding a prof profile, it would go right next to the release profile, and having both of them split the two sections that customize dev profile configuration (doing so even in a related way) would exacerbate the current unclear situation. I'd like to remove whatever is not planned or adjust comments to reflect the goal of code that is wanted for the future but not yet, and then if applicable put the most related things together.

Expected behavior 🤔

See above. It seemed clearer to include this for each subsection.

(I considered opening multiple issues or opening no issue for anything here and just bringing it all up in comments, but I figured this might still be best, even though it goes a bit against the format, because there is at least one other issue related to the comments there, #1478, that should be covered separately and should not be confused with any of this.)

Git behavior

I have not researched this. Some choices of optimization level or the treatment of debug symbols might be illuminating. But the tooling and language are different, and for the upstream Git project, I believe there are no official binary releases. So it seems largely inapplicable.

Steps to reproduce 🕹

Since this is a clarity and release-engineering issue opened for tracking and feedback about things I think are mostly known, reproduction consists mainly of examining the code shown, linked, or otherwise discussed above and in the related portion of the #1475 comment discussion.

One practical note, though: On some platforms, the file command will reveal debug symbols, including text like "not stripped" in its output when they are present. But it seems the file command on macOS may not do this.

Byron commented 2 months ago

Thanks for opening this issue to figure out how releases should be configured.

Perhaps benchmarking was done and found LTO not to be helpful for gix and ein. (In other software that uses any gitoxide crates, that software would of course set whatever LTO-related configuration is wanted.) But the commit and PR do not state a reason for the change, and they are otherwise not conceptually related to LTO or optimization. If the goal is to decrease build times, then thin could be used, which is documented to be almost as good as fat while being faster to perform.

Another possibility is that LTO was turned off because the resulting optimizations decreased the detail available when profiling or made the results hard to interpret. If that was the reason, then I think that, when splitting release into release and prof, this would be resolved by setting different values of lto in the two different profiles. release could have fat or thin, while prof could stick with the default of false.

Sorry for the mess. As far as I remember, the LTO setting was to slow to compile so eventually I commented it out, leaving it to the standard that should be fine. For optimizing build times, thin might be better and I wouldn't object that.

In theory, the choice of build profiles should be made so that nobody is surprised and gets served well.

When adding a prof profile, it would go right next to the release profile, and having both of them split the two sections that customize dev profile configuration (doing so even in a related way) would exacerbate the current unclear situation. I'd like to remove whatever is not planned or adjust comments to reflect the goal of code that is wanted for the future but not yet, and then if applicable put the most related things together.

Yes, please feel free to clean that up and remove commented-out parts with impunity. If I remember correctly, optimizing all dependencies unconditionally felt too slow at the time, so I settled for a subset which seems to still be the right choice in retrospect. However, it's too long to really know for sure it has the desired performance impact, even though the current setting doesn't seem to make it too bad.

Some additional thoughts

The real question is: what should happen if cargo install gitoxide is done? Generally, I like having those binaries in an unstripped form in case there is a panic. And also, these binaries shouldn't take forever to compile, which speaks against ever using lto = "fat" there. In a way, this would mean --release should remain as the default and other profiles should be added to satisfy specific needs.

For instance, when building for GitHub releases, one might want to use all optimizations aggressively and strip symbols. When profiling, one might want additional debug assertions but also release optimizations (but nothing that increases build times crazily).

EliahKagan commented 2 months ago

I've opened #1495, but as alluded to there and detailed below, I think significant changes might be wanted before considering that to address this issue in the best way.

As far as I remember, the LTO setting was to slow to compile so eventually I commented it out, leaving it to the standard that should be fine. For optimizing build times, thin might be better and I wouldn't object that.

How slow is too slow? With lto = "fat" and codegen-units = 1, which are the more aggressive optimizations that seem like it might make sense to set for infrequent builds that are allowed to be slow, it seems to me that the builds may actually not be too slow, even for the release profile.

I tested three configurations, with results and some further explanation in this gist:

Please let me know if you would like more of the information from that gist to be here (or perhaps in a new discussion post or somewhere).

lto = "thin" seems about halfway between the two others, which is not as fast as I had expected. But even the slowest seems okay.

Most of the slowdown seems due to less parallelism. I am inferring this from the total processor time versus total elapsed time. The bulk of the effect may be due to codegen-units = 1. I can test it without that, but since the main reason to avoid that is to get a faster and more parallelizable build, that may not need to be investigated specifically if all these timings are acceptable.

To be clear, I did not benchmark the code itself, I only tested how long it took to build it. I suspect that lto = "fat" may not provide all that much more runtime speed over lto = "thin", since I have heard that is usually the case and the documentation suggests it.

I am wondering if lto = "fat" has become faster over time, or common hardware has become faster, or a combination of the two, such that it might be more worth considering than when you tried it out. Do you remember how much slower it had made things?

In theory, the choice of build profiles should be made so that nobody is surprised and gets served well.

In the same way that one of the profiles cargo automatically defines is test and it inherits from dev, another it automatically defines is bench and it inherits from release.

So I think the least surprising way to do it might be to go with the original idea of having release be optimized as aggressively as is reasonable and, assuming any profile is to have its symbols, for release to be that profile, and for this to be undone in bench, which could be used for profiling (i.e., bench rather than prof, profile, or profiling).

If I remember correctly, optimizing all dependencies unconditionally felt too slow at the time, so I settled for a subset which seems to still be the right choice in retrospect.

This could be revisited along with the lto question if the modest slowdown is to be considered acceptable. Though here the goal is only to speed up the tests, so the running time of cargo clean followed by cargo nextest run --all --no-fail-fast could be checked to find out what makes it faster.

Another option would be to not use any optimized dependencies in the dev configuration but to move that customization to a separate test configuration. On the other hand, the complexity of more nonequivalent configurations may not be worth it, and having dev and test work exactly the same is a benefit in manually investigating test results (especially if one assumes they are the same, as they usually are).

The real question is: what should happen if cargo install gitoxide is done? Generally, I like having those binaries in an unstripped form in case there is a panic.

When users install that way, do they want that additional information? If so, then perhaps the debug info should be kept in even in the GitHub releases.

I think I would usually expect the practical differences between cargo install gitoxide and cargo binstall gitoxide to be:

Knowing that various build workflows such as those of ripgrep and gitoxide strip binaries causes me, as a result of that, to be aware that another difference could be the absence of debug info in the binaries. But this is not something I would have expected before, or that I would expect in general or across languages.

How great is the benefit of stripping the symbols? And is there any way to provide them in separate files instead, as downstream distributions sometimes do? Is that worth doing?

Byron commented 2 months ago

I arrived here late, after merging the corresponding PR, but want to pick up on a few parts nonetheless.

I am wondering if lto = "fat" has become faster over time, or common hardware has become faster, or a combination of the two, such that it might be more worth considering than when you tried it out. Do you remember how much slower it had made things?

To my mind, a good way to go about this is to crank up everything to 11 for GitHub releases as build-speed doesn't really matter there or is well within limits. To my mind, if builds take a little less than an hour, it's still good enough.

Important operations, like cloning and checkouts, could then be compared from time to time to see if GitHub builds are measurably faster than local release builds. If so, it's probably working and worth it, even if it's just a percent or two.

In the same way that one of the profiles cargo automatically defines is test and it inherits from dev, another it automatically defines is bench and it inherits from release.

That's very interesting and I wan't aware of that.

So I think the least surprising way to do it might be to go with the original idea of having release be optimized as aggressively as is reasonable and, assuming any profile is to have its symbols, for release to be that profile, and for this to be undone in bench, which could be used for profiling (i.e., bench rather than prof, profile, or profiling).

Indeed, using bench could be the profile used in actual profile runs, and it wouldn't have its symbols stripped. Personally, I think release also should keep its symbols, and if it panics it's good to see where it is coming from.

Another option would be to not use any optimized dependencies in the dev configuration but to move that customization to a separate test configuration. On the other hand, the complexity of more nonequivalent configurations may not be worth it, and having dev and test work exactly the same is a benefit in manually investigating test results (especially if one assumes they are the same, as they usually are).

I also think that having these optimizations in dev has not been an issue until now, and tests work well (and fast enough). It's something to keep, I think.

How great is the benefit of stripping the symbols? And is there any way to provide them in separate files instead, as downstream distributions sometimes do? Is that worth doing?

Reducing the binary size probably is the primary driver for stripping symbols, and as panics are rare or never happen, they might effectively be useless. Those who want symbols for profiling or debugging probably build their own binary (or can be expected to install with cargo install).

So probably these binaries should still be stripped based on that valuation. Personally, I wouldn't provide debug symbols either as it's even more files on the downloads page then, and people probably are faster just recompiling with cargo install than to download these and put them into the right place.

EliahKagan commented 2 months ago

Personally, I wouldn't provide debug symbols either as it's even more files on the downloads page then

Sorry about the confusion there. Providing them as separate files (and obtained by separate downloads) is, in hindsight, the most salient element of the example I linked to, but actually I did not mean to advocate that gitoxide do that.

Instead, I was thinking that if the release profile were somehow to put the symbols in separate files (on all platforms, I mean, and not just on Windows where, at least with MSVC, that always happens), then it might not be necessary to have two release-like profiles.

I think that at least if lto = "fat" confers runtime performance advantages over lto = "thin" then probably there should still be two profiles. But if it turns out that the balance of build time versus runtime performance favors all the same choices for both, such that it comes down to whether symbols are included, then this could be a solution.

I would still not ship the symbols as separate downloads, and likely not even include them in the existing archives either. Though there might be an argument for doing the latter, since while it would increase the size of the downloads, it would not increase the amount of memory the binary image takes up when the program is run. This it would not have even a slight negative effect on CPU cache performance, paging, or anything else affected by executable size.

Having the symbols is useful in case of panics even outside of profiling, but it is not obvious to me why users would want this when installing from source with cargo install gitoxide but not want it just as much installing a binary with cargo binstall gitoxide. One possible solution, not involving the creation of separate symbol files, might be to firm up the release-github profile, under that or another name, and document it as reasonable to use externally.

(Even if so, I should probably benchmark runtime performance before doing anything that would either commit gitoxide to maintaining a separate profile or even that users might wrongly assume commits gitoxide to doing so.)

Byron commented 2 months ago

Having the symbols is useful in case of panics even outside of profiling, but it is not obvious to me why users would want this when installing from source with cargo install gitoxide but not want it just as much installing a binary with cargo binstall gitoxide.

It's really just me who would want to have panics and cargo install is the way I install gix on my systems. It's true that seeing symbols on panic should always be useful, especially while gix (the crate) isn't stable.

Thus far, GitHub releases were optimized for download size, so maybe, to unify this, we just don't strip the binaries as a start. Those who don't want them can then strip them themselves.

Finally, if there is a benefit in having minimally-sized archived, these could be created separately, for instance for use in GitHub actions that would need to startup as fast as possible. Maybe doing that also saves time (and nerves :) (?)) on your side.