arthurhsliu / distcc

Automatically exported from code.google.com/p/distcc
GNU General Public License v2.0
0 stars 0 forks source link

Apparently not getting much parallelism #136

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Answering the following questions is a big help:

1. What version of distcc are you using (e.g. "2.7.1")?  You can run "distcc 
--version" to see.  If you got distcc from a distribution package rather than 
building from source, please say which one.

distcc --version
distcc 3.2rc1 i386-apple-darwin13.1.0
  (protocols 1, 2 and 3) (default port 3632)
  built Dec 25 2013 16:30:21

2. What platform are you running on (e.g. "Red Hat 8.0", "HP-UX 11.11")?  What 
compilare are you using ("gcc 3.3")?  Run "uname -a" and "cc --version" to see.

I’m on Mac OS X 10.9 using clang-503.0.22

3. What were you trying to do (e.g. "install distcc", "build Mozilla”)?

Build Clang and LLVM.

4. What went wrong?  Did you get an error message, did it hang, did it build a 
program that didn't work, did it not distribute compilation to machines that 
ought to get it?

I’ve seen in the logging that the daemon detects both my local and remote 
hosts as having 8 CPUs and allocates 10 jobs each.  However, if I watch the 
number of clang jobs running on the remote host at any time, it is usually 
zero.  Sometimes it will get a job or three.  If I have DISTCC_VERBOSE=1, I see 
that it occasionally claims about the remote machine’s “slots” being 
busy, and it reports on slots 0-3.  Presuming that these have some relationship 
to the jobs it will run, I’m surprised to see only four slots on an 8 core 
machine.  The machine is of course really 4 cores with hyperthreading, so maybe 
there’s a detection problem?  Running distccd -j10 on the remote machine 
doesn’t seem to make much difference to the result; there are still only 3 
slots reported as busy.  If this were only about mis-allocating parallelism, 
though, it would probably be runing 3 or 4 clang jobs at most times during the 
build, right? At any rate, this is much slower than building locally without 
distcc.

Original issue reported on code.google.com by HamsNo...@gmail.com on 6 Feb 2014 at 7:07

GoogleCodeExporter commented 9 years ago
Try using the /LIMIT field of the host specification in your DISTCC_HOSTS 
environment variable, e.g. DISTCC_HOSTS="localhost/16 myremotehost/16,lzo,cpp".
The default limit is 4 jobs (2 for localhost).

Original comment by fergus.h...@gmail.com on 6 Feb 2014 at 8:01

GoogleCodeExporter commented 9 years ago
Okay, that helped a bit, thanks.  It’s not obvious why one would have to do 
this given that the number of potential jobs is specified elsewhere…

It still isn’t great.  If I sample the number of clang jobs on the remote 
server every .5 seconds, I get this:

1, 5, 8, 4, 1, 4, 7, 4, 3, 3, 4, 2, 1, 1, 1, 1, 1, 1, 2, 8, 7, 4, 2, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 13, 10, 6, 5, 5, 2, 
2, 3, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 7, 13, 7, 5, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 5, 9, 11, 15, 
21, 19, 19, 15, 14, 11, 7, 4, 3, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…

I’m also getting lots of this:

distcc[8309] ERROR: compile 
/Users/dave/src/s/llvm/tools/llvm-diff/DiffConsumer.cpp on @eno/10,lzo,cpp 
failed
distcc[8309] (dcc_build_somewhere) Warning: remote compilation of 
'/Users/dave/src/s/llvm/tools/llvm-diff/DiffConsumer.cpp' failed, retrying 
locally
distcc[8309] (dcc_mark_timefile) mark /Users/dave/.distcc/lock/backoff_ssh_eno_0
distcc[8309] Warning: failed to distribute 
/Users/dave/src/s/llvm/tools/llvm-diff/DiffConsumer.cpp to @eno/10,lzo,cpp, 
running locally instead

and it falls back to doing remote compiles without preprocessing.  Ideas?

Original comment by dave@boostpro.com on 7 Feb 2014 at 1:26

GoogleCodeExporter commented 9 years ago
Actually those numbers are at least 1 greater than they should be.  That is, 
the 1’s should all be 0’s.  I forgot to filter out the “grep clang” 
process I was using to count the instances.

Original comment by dave@boostpro.com on 7 Feb 2014 at 1:28

GoogleCodeExporter commented 9 years ago
Perhaps you don't have the appropriate system header files installed on the 
remote machine?

Try setting DISTCC_FALLBACK=0 to see what error messages you get from the 
remote compilation.

Original comment by fergus.h...@gmail.com on 7 Feb 2014 at 11:03

GoogleCodeExporter commented 9 years ago
OK, that was a big help.  I updated the system configuration to match on both 
the local and remote hosts, and got pump mode working.  However, using 
distccmon-text, there are still really long stretches here and there where all 
the work is happening on localhost, even with --randomize set.  And there are 
never even short stretches where the remote host does all the work.  I’m using

DISTCC_HOSTS="--randomize localhost/10 17.226.35.138/10,lzo,cpp”

there’s nothing obviously wrong there, is there?

Original comment by dave@boostpro.com on 14 Feb 2014 at 5:40

GoogleCodeExporter commented 9 years ago
Distcc only parallelizes preprocessing and compilation, not linking.
Linking is always done on localhost.

Original comment by fer...@google.com on 14 Feb 2014 at 10:43

GoogleCodeExporter commented 9 years ago
Yep, I'm aware of that.  It doesn't explain why there should be long stretches 
with 10 compiles on localhost and none on the remote one, AFAICT.  All the 
linking happens on localhost, and distccmon doesn't show it, right?

Original comment by dave@boostpro.com on 14 Feb 2014 at 3:22

GoogleCodeExporter commented 9 years ago
Well, there's nothing obviously wrong with your config as far as I can tell.

Distcc is open source, so if it is not meeting your needs, please feel free to 
improve it :)

Original comment by fergus.h...@gmail.com on 14 Feb 2014 at 5:49

GoogleCodeExporter commented 9 years ago
I think the following paragraph from documentation answers this question:
If you play around with --localslots and --localslots_cpp, you may gain some 
speed.

`There are two special host names --localslots and --localslots_cpp which are 
useful for adjusting load on the local machine. The --localslots host specifies 
how many jobs that cannot be run remotely that can be run concurrently on the 
local machine, while --localslots_cpp controls how many preprocessors will run 
in parallel on the local machine. Tuning these values can improve performance. 
Linking on large projects can take large amounts of memory. Running parallel 
linkers, which cannot be executed remotely, may force the machine to swap, 
which reduces performance over just running the jobs in sequence without 
swapping. Getting the number of parallel preprocessors just right allows you to 
use larger parallel factors with make, since the local machine now has some 
machanism for measuring local resource usage.`

Original comment by VENKATAK...@gmail.com on 7 Mar 2015 at 4:39