build2 / build2

build2 build system
MIT License
591 stars 15 forks source link

Do not use the install program for dist and potentially for install #190

Open boris-kolpackov opened 2 years ago

boris-kolpackov commented 2 years ago

We currently use the installprogram for the dist meta-operation similar to the install operation. While it's not the fastest on Linux, install.exe is really, really slow on Windows. And while it's not a big deal when preparing archives, dist is also used to checkout a package from a VCS-based repository. As a reference point, checkout of libfreetype with ~400 files takes more than a minute on Windows.

On Windows install.exe is actually an MSYS2/Cygwin executable which goes through some undocumented Windows API in order to recreate POSIX file overwrite semantics (for example, it is capable of overwriting a running executable file, yes, really). But, we don't really need this for dist (unlike install). In fact, in dist the first thing we do is rmdir_r() the target directory. So it seems there is no reason not to just use our file copying functions instead of running install.

boris-kolpackov commented 2 years ago

Perhaps while at it we should all add a "fast install" mode or some such along similar lines? Maybe we can even automatically enable it if we see the target directory does not exist or is empty (which will probably be quite common on Windows).

boris-kolpackov commented 1 year ago

Ok, I've implemented this for the dist meta-operation (note that the old behavior can still be requested with config.dist.cmd=install). The changes are available in the latest staged toolchain. Please test and report any issues.

boris-kolpackov commented 1 year ago

BTW, doing something like this for install has another issue: sudo (config.install.sudo). So it seems the criteria which we will have to use if we do this for the install operation would be something like this:

  1. There is no sudo program.

  2. If on Windows, the install root directory is empty (see above for why).

We could probably relax (2) a bit and just fallback to install.exe if the destination file already exist.

Klaim commented 1 year ago

We could probably relax (2) a bit and just fallback to install.exe if the destination file already exist.

That seems reasonable to me and that should work well in most cases, like when the directory contains unrelated packages and the user wants to setup some kind of dependency directory where everything is installed.

I will do proper testing and comparisons today on Windows and will report. I didn't see much of a difference on linux except that depending on tons of boost libraries, the configuration went fast (with that staged version) although I didnt think about comparing before-after upgrading there as it's less problematic than on Windows.

Klaim commented 1 year ago

I observed a great improvement of performance on Windows 11 with last staged version, using a "worst case but realistic" project/scenario.

I setup a bdep new -t lib project and added the following:

I also made the library dependent on these dependency libraries. Note that Qt6 is acquired through git.

I only bdep init -C ../build-msvc cc and captured that on video for measuring (so it's not perfect but I wanted to capture the user's experience). I removed .bdep and all the configuration directories between each measurements.

Before upgrading to last build2 staged, it took 7'47'' (from 0'13'' to 7'50'' in the recording) to run that command. I did it only once. (note that my computer is on the high-end spectrum and was not doing anything else than capturing the video). After upgrading build2 to last staged, it took 1'40'' (from 0'18'' to 1'58'' in the recording). I was suprised so I run it a second time after removing all the build directories, that time I got 2'00'' (from 2'07'' to 4'07'' in the recording). Keep in mind that networking etc. fluctuate the results, but that's still a massive difference.

That new-version recording also shows that the new bottleneck is acquiring the upstream/ git submodule from Qt6, which is a lot of files (and history?). In the previous-version recording had tons of time passed on distributing, it's now ignorable in new-version.

Good job! :+1:

The recordings (including details on the context) are available there (you might have to download them, gdrive didnt process them yet):

It would be useful if some other people tested with real and "big" projects, see if they see similar improvements.

boris-kolpackov commented 1 year ago

Nice, thanks for testing!

That new-version recording also shows that the new bottleneck is acquiring the upstream/ git submodule from Qt6

The actual fetching is pretty fast. That long pause between that and dist is, I believe, symlink processing that we have to perform on Windows and which involves asking git for properties of each file in the repository. We have already optimized that with batching (i.e., we ask for a whole bunch of files at once), but it's still quite long for something like Qt.