Closed frerich closed 7 years ago
how to run clcachesrv.py as a server process from command line? will it be possible to dump and load cache from this process or how could this be used in case like appveyor where we start every time from a clean system? https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.383/job/5fh1qd2m5h26jf17
The idea is that the server is simply run on the command line. In this rough proof of concept, it opens a named pipe (\\.\pipe\clcache_srv
) and waits for incoming messages. There is no means to make it save or restore the data in memory, but maybe we want to do that later.
For what it's worth, I'm currently experimenting with a reimplementation of clcachesrv
based on pyuv
, Python bindings to libuv. Maybe this simplifies stuff.
@@ master #248 diff @@
==========================================
Files 1 1
Lines 1015 1035 +20
Methods 0 0
Messages 0 0
Branches 171 173 +2
==========================================
+ Hits 919 924 +5
- Misses 68 82 +14
- Partials 28 29 +1
Powered by Codecov. Last update b611ea6...32db227
Closer but still some errors https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.398/job/62a67yy2tiaos9fd
Thanks for the notice - I found an exception in line 3023.
It seems that the server tries to get the hash given a truncated path. I think the problem here is that so much data is sent by clcache (so many paths) that either the buffer size was exceeded (currently 64k) or the server side did not read it in one chunk, i.e. a single 'read()' call was not enough. I'll play with it some more later today.
Thanks for testing the implementation. :-)
Yes, if you search for "fail" you will normally quickly find all the real issues... Just note that I don't always fully understand what I am doing, so if I should test it in some different way just say ;)
So yes, it looks like something is working :) I actually did not expect it to show any speed up on the second run since I thought it would be required to dump, store and reload the file hashes from clcachesrv to be able to do that...
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.399 msbuild time 36:27.80 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.400 msbuild time 09:27.24
@sasobadovinac Thanks for providing those numbers!
If I read it correctly, in build 1.0.399 you built freecad twice:
clcachesrv
running. The build finished about 42 minutes later and resulted in a cache 741MB in size and 1221 misses (0 hits).Later, build 1.0.400 started:
clcachesrv
is started, and the compile starts. 15 minutes later, the build finished. At this point, the cache is still 741MB in size, with 1221 entries -- and 1221 hits. I.e. the entire build caused cache hits.That looks like a nice speedup, but I wonder: how much of this is because of setting CLCACHE_SERVER? I.e. how long would a build take (a build with an empty cache and then one with a filled cache) if you would not use CLCACHE_SERVER?
@frerich Yes, I do an 64bit and an 32bit build every time and each of them save / restore their own cache. This are about the "normal" times I am getting for this builds, about 35 min if clcache is not used or empty cache (setting up the clcache and saving / restoring the cache can add few minutes, so we have in this case about 42 min) and about 15 min if clcache is used on the same commit (no code change, all cache hits).
So I don't really see any visible speed up because of clcachesrv compared to just running clcache directly. Here is one to compare https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.355 and one with xxhash https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.360
Interesting! Can you maybe run both a 'cold' and a 'warm' build (i.e. with empty and filled caches) but this time also set the CLCACHE_PROFILE
environment variable and then use the 'showprofilereport.py' scripts to get a runtime profile? That would allow us to determine where the time is spent for cache misses (i.e. during the cold build) and cache hits (for the warm build).
You can simply set CLACHE_PROFILE
to 1
, then run clcache as usual and at the end just run showprofilereport.py
-- it will collect all generated profile data and print a report to standard output.
I am just now running a build with clcachesrv where I have fully cleaned and rebuild the cache and will next run another build since in this way it shows the best times to compare (I am often just running build after build with different parameters so the cache gets mixed and big). After that I will run with CLACHE_PROFILE, should I do that with clcachesrv or clcache? I can do both just don't know if today :)
A profile with CLCACHE_SERVER
set (and clcachesrv
running) would be perfect, thank you!
One more thing came to my mind: I see that in the logs there is also some CMake output, so I guess not all of the time is spent on actually compiling stuff. Maybe you could print some time stamps before/after the actual compilation so that we get a better idea of the relative improvement?
@frerich msbuild writes the time of just the build time at the end, it is indeed best to compare this ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L8962 ), also if you hover your mouse over each line of the report it will shows the time :)
But I am doing something wrong or is clcachesrv not supported by profiler ( https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.415/job/fbk1qike6mkxob6l#L9252 )?
Thanks for the tip with the tooltips - that's useful! I didn't know that!
As for the issue with running the showprofilereport.py
: it's a Python script, so I think all that's missing is adding python
in front of the command line. Or maybe it's executed in the wrong working directory.
When setting CLCACHE_PROFILE
, clcache will generate .prof
file for each invocation which contain the runtime information. The showprofilereport.py
script then recursively aggregates all .prof
files in the current working directory to build a report.
clcachesrv https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.419 msbuild time 36:05.46 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.420 msbuild time 09:02.03
clcache https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.421 msbuild time 36:28.62 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.422 msbuild time 09:56.07
Thanks! Alas, it appears that showprofilereport.py
was not called in the right working directory: I see output like this:
python showprofilereport.py
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-46129633486cca23d9f911ff568eff96.prof...
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-61714fd51d7a8a307763744808a886d8.prof...
Reading C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-7a083a10cc7a2bb17dcca48e072ba07d.prof...
Sat Nov 26 02:22:16 2016 C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-46129633486cca23d9f911ff568eff96.prof
Sat Nov 26 02:22:15 2016 C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-61714fd51d7a8a307763744808a886d8.prof
Sat Nov 26 02:22:15 2016 C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\clcache-7a083a10cc7a2bb17dcca48e072ba07d.prof
48931 function calls (48915 primitive calls) in 0.407 seconds
I.e. the script is called in the directory C:\Program Files (x86)\Microsoft Visual Studio 12.0
or so, not in the actual build directory of your application. Hence, it only picks up three .prof
files (i.e. the runtime data of three compiler invocations) and thus generates a very short report. It seems the directory C:\projects\freecad\build
might be better, maybe you can give that a try?
Yes sorry, I run it from where I run the clcache and when I saw some output I thought it was ok :) Does this one look good https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.423 ? It is just a test to make sure the showprofilereport.py is showing correct data, will rerun the other builds after it...
Yes, that's looking much better. 👍
clcachesrv https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.424 msbuild time 35:57.86 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.425 msbuild time 09:08.47
clcache https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.426 msbuild time 37:09.57 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.427 msbuild time 09:53.22
Thanks! Here's a quite analysis of the profile reports:
Without clcachesrv
, the profile report says that 1243.841 seconds were spent on executing the Python code in clcache
; this is about 21 minutes, i.e. longer than the entire job. This probably happens because clcache
is called with multiple source files, in which case it reinvokes itself for each source file and builds them concurrently, so the runtime of each concurrent invocation is summed on top of the parent process' runtime. Most of that runtime (924 seconds) is spent in WaitForSingleObject
, and 329 of those seconds are caused by the waitForAnyProcess
function in clcache
waiting for any of the recursively launched processes to finish. So I think it's safe to say that the 'effective' runtime is more like 1243-329 = 914 seconds.
With clcachesrv
, 990 seconds were spent in the clcache
Python code. 820 seconds in WaitForSingleObject
alone! 230 of those seconds are caused by waitForAnyProcess
, i.e. if we deduct that we are left with 990-230 = 760 seconds. So that would be a 20% improvement but I wonder whether the runtime of the clcachesrv
script should be included (I don't think so though, the runtime of clcachesrv
is implicitly contained in the runtime of the getFileHashes
function since it waits for the server.
One thing to add, I believe this is running with 2 CPU cores
This does not yet have the improvements from #244 right?
@sasobadovinac No, it did not -- the commits were based on an older version. I just rebased them on the latest master
, so now they include the optimistic locking.
I am unfortunately not getting far with the clcachesrv in my build environment. At about file 30 of 8000+ I get the following error from about 17 clcache processes. This is from my 24 core machine.
File "c:\Temp\\clcache.py", line 758, in getFileHashes
with open(pipeName, 'w+b') as f:
OSError: [Errno 22] Invalid argument: '\\\\.\\pipe\\clcache_srv'
With debug output in clcachesrv I can see that it does do its job for some files. The last output is 4 lines of sent response to client, closing connection
. The clcachesrv process did not crash.
This test is from before the rebase to master.
Any idea how I can help to debug?
@akleber Thanks for giving it a try! Unfortunately I'm a bit stumped, I can't think of a good explanation for that symptom. I suppose open()
would fail with that error message in case that file is not ready for reading or so? Maybe there are too many concurrent connections to the cache server?
This is also the first time I did anything significant with named pipes or the pyuv
module on Windows, so it may well be that I'm not releasing some sort of handle correctly.
A few experiments to trace this down:
I had this error once https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.436/job/2sx9139178n0cbj7#L3536
I did nothing special to trigger it, but it also did not show up again in more recent builds. I did nothing to fix it, just rerun the build
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.436 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.437
If this issue is reproducible somehow, I'd love to try opening the named pipe on the clcache
side unbuffered, i.e. via
with open(pipeName, 'w+b', 0) as f:
I wonder whether this is caused by some flushing not being done.
I did some more tests:
Did not work with 8 processes either. However This time I got no error in clcache but in ninja (my build driver): ninja: error: FindFirstFileExA(some_path_here): Access is denied.
Now running with 4 processes.
This time I got no error in clcache but in ninja (my build driver): ninja: error: FindFirstFileExA(some_path_here): Access is denied.
I wonder whether maybe this error is somehow related to the fact that clcachesrv
is watching directories for changes. Maybe that causes some directory handle to be opened and that in turn confuses Ninja?
To disable the file system monitoring, you could open server/clcachesrv.py
and remove the line self._startWatching(dirname)
in HashCache.getFileHash
. That'll make clcachesrv
compute and cache hashes, but never invalidate them in case the file changes somehow. For testing, that might be okay.
I now posted a public request, asking for some way to try clcache in a Windows machines which can build at least four (preferably more) C++ files concurrently. As it is, it's really hard to debug this since I usually cannot reproduce such problems for the lack of a strong enough build machine. :-/
I had one of this again, same as before rerunning the build has fixed it... https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.460
With 4 processes I did not encounter any error. With 8 processes and without directory watching I got back to the open pipe error. But I am not convinced that the ninja error is gone by removing the directory watch. I am rerunning with 8 cores again, lets see....
I am currently back to run with unlimited (24) processes and a warm cache. Also I updated to ninja 1.7.2 for an improved output and I run clcachesrv with Debug output.
I see the following: On the clcachesrv side I see 5 incoming connections that get handled correctly and then 5 matching closing connections. No errors or anything unusual.
On the ninja side I see how ninja starts 21 compile processes. 5 of them finish without error and 16 fail with the Invalid argument
error.
I think this SO article has some info regarding this problem, mainly that the error message might be wrong.
@frerich It sound like you had another implementation of clcachesrv (without pyuv). Is it available somewhere so I could test against this version?
@akleber Thanks for the testing! Indeed a first iteration of the server process was done without pyuv
, but it had a couple of other problems due to which it wasn't really usable - so I fear that it may not be practical to give it a shot.
I think it's curious that it's always five processes which succeed, so I wondered whether maybe this is the length of the 'listen' backlog (think the listen
function when dealing with sockets). However, the pyuv.Pipe
documentation suggests that there may be up to 511 waiting connections, so maybe that's not it...
The good news is that I'm now able to reproduce it, it seems I just never tried large-enough values! The following Python script is my test case:
import os
import sys
import multiprocessing
import clcache
NUM_CONCURRENT_TASKS = 14
def getFileHash(path):
os.environ['CLCACHE_SERVER'] = "1"
return clcache.getFileHashes([path])
if __name__ == '__main__':
p = multiprocessing.Pool(NUM_CONCURRENT_TASKS)
print(p.map(getFileHash, [sys.argv[1]] * NUM_CONCURRENT_TASKS))
What it does is that it expects a running clcachesrv
instance and that it's given the path to some file on the command line. It will then spawn NUM_CONCURRENT_TASKS
concurrent processes to fetch the hash sum of that file. If everything works right, the program should print the same hash sum repeatedly.
For me, 14 seems to be the break even point. It mostly works, but occasionally fails with the same error message you reported. 16 concurrent processes trigger the issue consistently.
@akleber I just pushed a hack (maybe let's call it an "experimental patch" :-)) which seems to help for my test case: it appears that in case open()
raises an OSError
object with errno.EINVAL
, this can be ignored and we can retry. As it is, the code will just spin -- which is not pretty, but good enough for a proof of concept. It would be great if you could give this version a shot to see whether it helps.
For what it's worth, while debugging I added a counter which tracks how often a clcache
instance is spinning in getFileHashes
while trying to open the named pipe handle. The numbers varied greatly for me (between 1 to 82 retries for 16 concurrent accesses). What did help though is to change the backlog size passed to the listen
method in clcachesrv.py. I'd still get spinning, but a lot less.
I suspect that this might just all being caused by me not getting the handle management right in clcachesrv, i.e. releasing pipe handles not early enough or so. I couldn't find anything yet...
@akleber I believe I now fixed this issue properly. I found a document on Named Pipe Clients on the MSDN, and it explains:
A named pipe client uses the CreateFile function to open a handle to a named pipe. If the pipe exists but all of its instances are busy, CreateFile returns INVALID_HANDLE_VALUE and the GetLastError function returns ERROR_PIPE_BUSY. When this happens, the named pipe client uses the WaitNamedPipe function to wait for an instance of the named pipe to become available.
I failed to call WaitNamedPipe
in case this condition is detected; doing so fixed my test case and it seems I can now use an arbitrarily large number of concurrent clients. Can you give it a shot?
@sasobadovinac Maybe you'd be willing to give it another shot, too? :-)
Sure :) but I can run just some general tests since I had this error very infrequently, so don't know how to force it...
Here are two builds from the commit before... I see you made some updates, should I rerun them again? Is run jobs fix from the master included here?
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.479 msbuild time 38:09.34 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.480 msbuild time 09:30.97
@sasobadovinac It appears that the version you tried is okay; the changes which happened after that are mostly cosmetic. For what it's worth, it was recently noted that the --onefile
argument for pyinstaller
makes things quite a bit slower. You could try omitting it and see whether it improves things for you.
I am afraid I don't know how to run it by omitting --onefile
on appveyor :\
https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.489/job/l0thxsba4k579pfr
Googling for that error message yields a lot of hits, it's apparently a symptom of a Python installation mixup -- which may well be the case, given that the AppVeyor VMs come with multiple Python installations. My superficial understanding is that the message means that you're using some Python version which tries to pick up libraries from a different Python installation. Maybe caused by your build script setting PATH to point to a different Python, but not PYTHONPATH
(or PYTHONHOME
).
Try setting PYTHONPATH
and/or PYTHONHOME
to C:\Python34
; does that help?
For me clcachesrv now works. Also I produced kind of an unbelievable result but I was not able to find any problems/errors yet. So here we go: Our build without clcache takes 45min. With clcache with cold cache it now takes 35min, which I do not yet understand. But the best part is, that the build with clcache and with a warm cache now takes 7 min. clcachesrv was started before the cold cache build and ran until after the warm cache build. Its memory footprint was 23mb at the end. 7 min is unexpectedly fast so please take these numbers with a grain of salt until I verified them. Also I had to disable the directory watching. It was not needed for the test because no file changes. I needed to do this because it seems as if it keeps some kind of handle open so that I am not able to delete watched directories. In our build-directory headers are being generated and subsequently watched. In preparation for the next build this build directory is deleted which did not work with an "Access denied" error. So I had to remove the watching feature for this cold-warm test. However in production this feature is absolutely necessary to invalidate hash-cache entries. I have not looked into this code but maybe @frerich has an idea?
@akleber Thanks for giving it another try - those numbers are encouraging, I hope there's nothing blatantly wrong. Those timings are better than I expected: previous profiling report already clearly showed that the runtime for warm caches is dominated by reading files and hashing the contents (for the sake of accessing cache entries) so I kind of expected things to improve. I didn't know how fast (or slow) named pipes would be though, i.e. what the maximum speedup would be.
As for the file monitoring: I just looked at the C code of libuv (which is what the pyuv module is based on) and see that it uses ReadDirectoryChangesW
on Windows to implement the file system monitoring. This function roughly works by accepting a directory name and a callback and then the callback is invoked whenever anything in that directory changes. Some blog article I found explains:
Another potential pitfall of this function [ReadDirectoryChangesW] is that the referenced directory itself is now “in use” and so can't be deleted. To monitor files in a directory and still allow the directory to be deleted, you would have to monitor the parent directory and its children.
Knowing this, I'm not sure how to proceed. Maybe we should whitelist/blacklist directory for clcachesrv such that it ignores certain directories (e.g. anything in the build directory or the temporary directory)? I think that would still give a lot of benefit, assuming that common system headers and includes of 3rd party libraries amount for a large portion of the runtime.
Or maybe clcachesrv can be made to watch directories recursively and thus we only watch certain 'base directories' -- the downside of this being that we may get a lot more events than expecte.
Some ideas:
It would make sense to run the caching server during one build only and still get most of its benefits. I.e. in the build job: clcachesrv start; (make|ninja|msbuild...); clcachesrv shutdown
. Actually it could reduce the number of cases that the server has to handle like having an ever-growing number of watched folders/files, folders being used, etc.
On the other hand, if a long running process is desired, it could also make sense to watch only folders in a white list (or complementarily ignore from a black list). For example, in our use case, we would watch only the compiler headers, windows headers and the folder where we put the external dependencies.
Even with those 2 solutions in place, the first one seems more appealing to distribute in a build node.
It would make sense to run the caching server during one build only and still get most of its benefits. I.e. in the build job: clcachesrv start; (make|ninja|msbuild...); clcachesrv shutdown.
There's still the risk that the build is trying to delete directories which are being watched, so for the sake of being defensive I tend to favor the whitelisting idea.
However, it might be kind of hard to tell what directories to whitelist...
I think for my use case it would be helpful to specify a regular expression of directory paths I would like not to be watched. E.g. something like --exclude ".*/build/.*"
. I do not have a List of path I would like to be watched or not be watched, but I can describe via a regular expression how paths look like that get deleted. This should be fairly simple to implement.
Another possibility although a lot more complicated to implement might be the following. Uppon exiting of clcachesrv persist the hash cache e.g. via pickle and on start load the hash cache and make sure it is up to date. This way one could have the clcachesrv process running while compiling and not running while e.g. the cleanup of intermediate directories.
I think for my use case it would be helpful to specify a regular expression of directory paths I would like not to be watched.
Yes, I think a blacklist would be much easier to configure and it would work equally well. I think it's much easier to tell which directories are the 'volatile' ones (because your own build system tinkers with them) than to tell the (possibly long) list of directories from where you are pulling in headers without even knowing.
I got some more numbers. Base is a the Ninja build as before on the same 24 core machine. clcache is invoked via batch/python. Without clcache the compiletime is 33min. With clcache and a cold cache its 45min and with a warm obj cache its 30min. With clcachesrv, a cold obj cache and a cold hash cache the time is 37min (8min improvement by clcachesrv). With a warm obj cache and a warm hash cache the time is 8min (22min improvement by clcachesrv). As I am still not convinced that the overhead of the header hashing by clcache should be so large I disabled the hash cache inside clcachesrv. So I was using clcachesrv but it essential did what clcache normally does but with a named pipe in between. This build took 30min which is the same time as clcache with warm obj cache and without clcachesrv. So the conclusion here is, that using named pipes has no notable performance penalty in my environment. I think my next step will be to implement my proposed blacklisting functionality and try to use clcache/clcachesrv in our CI build system, to get a better feeling about the effect of clcache with actual header/source changes. My simple rebuild tests are nice for some kind of baseline but in the end it is not what is going on in our CI system...
This somewhat experimental patch introduces a new program called
clcachesrv
to the distribution. It's a server process which opens a named pipe and waits for incoming messages via that pipe - each message is a newline-separated list of file paths. For each message,clcachesrv
will compute and cache the file hashes. The cached hashes are invalidated automatically sinceclcachesrv
uses file system monitoring API to detect changes to the files for whose the hashes were cached.The PR also introduces a new
CLCACHE_SERVER
variable which, when set, makesclcache
useclcachesrv
instead of calculating the hashes itself.My hope is that this resolves (or at least improves the situation described in) #239 -- while discussing the unexpectedly poor performance of cache hits, the code for reading files and hashing them turned out to be dominant in the runtime profile. The
clcachesrv
server hopefully improves this since it will only read and cache files when needed, even across multipleclcache
invocations.