Closed GoogleCodeExporter closed 9 years ago
The human cost of figuring out what went wrong and restarting a build with
--force_dirs is substantial.
To make this worthwhile, you would need to demonstrate that there is a very
significant performance advantage, which I doubt is the case.
Original comment by fer...@google.com
on 16 Apr 2012 at 11:17
> The human cost of figuring out what went wrong and restarting a build
> with --force_dirs is substantial.
Agreed. How about this instead then?: Invert the logic such that the default
behavior stays the same but forcing can be disabled via --no_force_dirs.
> To make this worthwhile, you would need to demonstrate that there is a
> very significant performance advantage, which I doubt is the case.
Disabling the include_server dir-forcing does provide a measurable advantage
for my use case -- building the Linux kernel on a large number of virtual cloud
hosts which are bandwidth-throttled by the cloud provider. By the end of the
build, the transmission of all the accumulated forcing files consumes more time
and bandwidth than the actual source file. I think the problem would similarly
affect any case where the network is fully saturated, or where network I/O is
expensive.
Note also that the code comments in include_analyzer.py already recognize this
issue (see below) but the probably more significant problem relates to the
forcing files, not the links. (Maybe the comment was written before the
forcing files got added to the "send 'em all every time" model?)
# Links are accumulated intra-build (across different compilations in a
# build). We send all of 'em every time. This will potentially lead to
# performance degradation for large link farms. We expect at most a
# handful. We add put the system links first, because there should be very
# few of them.
Original comment by kamal@whence.com
on 17 Apr 2012 at 4:58
> How about this instead then?:
> Invert the logic such that the default behavior stays the same but forcing
can be disabled via --no_force_dirs.
If this really improves performance significantly, then that sounds reasonable.
Can you quantify how much speedup you are seeing with this patch?
Original comment by fergus.h...@gmail.com
on 17 Apr 2012 at 5:42
Here is the revised --no_force_dirs patch for review (leaves the default
behavior as it was, adds new option --no_force_dirs).
> Can you quantify how much speedup you are seeing with this patch?
I measured the total network bytes transferred and the elapsed real time for my
pump-mode build of vmlinux*, with and without
INCLUDE_SERVER_ARGS="--no_force_dirs":
client net traffic elapsed
TxBytes RxBytes real time
------- ------- ---------
standard distcc r765 142 MB 159 MB 613 sec
with --no_force_dirs 126 MB 159 MB 594 sec
... result: 16 MB less TxB (11%), 3.1% speed-up
Also, because the leaked forcing_techinque files accumulate over the course of
the build, the speed-up delivered by --no_force_dirs becomes even more
pronounced if I build the optional kernel modules along with 'vmlinux'.
Of course, the performance benefit figures are peculiar to this vmlinux build
and my test configuration, so YMMV. But since --no_force_dirs will reduce
TxBytes for all non-trivial builds, it should result in some degree of overall
speed-up -- more so when the network is the bottleneck for whatever reason.
*This is approximately "pump make vmlinux -j64 CC=distcc" with a single 4-core
remote distccd server. For the record, I have hacked my linux-3.0.0 source
tree to use some header files which are pre-installed on the client and server
(without which this build transfers a whopping 2.5 GB of TxBytes and takes
about twice as long).
Original comment by kamal@whence.com
on 18 Apr 2012 at 8:23
Attachments:
Patch (include-server-no-force-dirs.2.patch) applied as SVN r767:
New option INCLUDE_SERVER_ARGS="--no_force_dirs" disables the use of
forcing_technique files; default behavior is unchanged.
Original comment by kamal@whence.com
on 20 Apr 2012 at 4:08
The patch looks fine.
However, 3% isn't huge. Is this worth the complexity of an additional option
that only works some of the time?
Would it be possible to find a solution that only sends these files when needed?
If so, that would be a lot nicer way to fix this.
Original comment by fer...@google.com
on 20 Apr 2012 at 4:31
Original issue reported on code.google.com by
kamal@whence.com
on 16 Apr 2012 at 9:49Attachments: