Closed frerich closed 8 years ago
@@ master #244 diff @@
==========================================
Files 1 1
Lines 1006 1014 +8
Methods 0 0
Messages 0 0
Branches 164 169 +5
==========================================
+ Hits 895 905 +10
Misses 83 83
+ Partials 28 26 -2
Powered by Codecov. Last update 005e0f7...17bc495
This helps in my use case. The version on master requires a huge timeout (600000 ms), with these changes it compiles successfully with the default (10000ms). 👍
About the compilation times: the cold cache case is a bit faster and the hot cache is very similar.
I will have a look at the code too.
Here is the result of a run with CLCACHE_PROFILE enabled: coldcache.zip
Nice idea. About your thought that it "might compile the same thing twice". I doubt that should be a worry since even without clcache invoking cl on the same files with the same settings and arguments would cause one to fail anyway.
@siu Thanks a lot for your thorough review, despite the fact that this patch is not exactly trivial.
Also, thanks a lot for actually trying it on a real project and providing the CLCACHE_PROFILE
output. I was mostly worried that I somehow missed a case where two cache updates might class, but that appears to not be the case? Alas these concurrency related tickets are very hard for me to debug since the strongest Windows machine I have access to has only two cores, so there's only very limited 'real' concurrency.
The profile you uploaded was quite interesting:
WaitForSingleObject
, and the exclusive caller of that function is subprocess.wait
, i.e. 95.5% of the time was spent waiting for the compiler. I wonder whether this allows claiming that clcache currently incurs ~5% overhead for cache misses.createManifestEntry
, which is dominated by the calls to getFileHash
My impression is that there is room for improvement -- but not much, at least not for cache misses. I think it's worthwhile to start working on a background process which takes care of hashing files and memorising those hashes; the cache misses would probably benefit a little bit, but it would also be a major improvement to cache hits.
Note that WaitForSingleObject
is not only called from subprocess.wait, it is the call executed to acquire the locks too. In this case though that entry in the profile seems to be dominated by the subprocess call.
In fact I tried applying some the changes I was suggesting locally and I could not see any relevant change in the profiler. The time spent reading manifests and deciding what to do is negligible compared to the time it takes to compile and it didn't help with a hot cache either.
I think we should profile other use cases before drawing any conclusion. Maybe a mixed execution with only 50% of the files in the cache will lead to more cases of manifest or artifact section clashes and could benefit more from reducing the time that the manifest section is locked. Anyway all this can be done later, for the moment it would just add complexity.
I had a second look at the code. It looks safe in my opinion.
It's true that WaitForSingleObject
is used in multiple places. However, in the profile report you generated, I see these lines further down in the report (in the section which depicts from where each function was called):
Function was called by...
ncalls tottime cumtime
[...]
subprocess.py:1154(wait) <- 1214 0.078 9460.795 clcache.py:1159(invokeRealCompiler)
{built-in method WaitForSingleObject} <- 1214 9460.709 9460.709 subprocess.py:1154(wait)
I.e. in this case, it appears that the only caller of WaitForSingleObject
was subprocess.py:1154(wait)
. And the only caller of that was invokeRealCompiler
. Quite frankly, it's not clear why the WaitForSingleObject
calls done by CacheLock.acquire
don't show up. In fact, there's no mentioning of the CacheLock
calls at all. Maybe they were just so fast that the profiler decided to not list them? Strange...
Ah, yes, that is true. The missing calls in the profile may be a limitation of cProfile?
Glad to hear that there's mostly positive feedback so far. I'm curious how it works for @akleber who reported #229 in the first place. :-)
Thank you very much @frerich for this PR which I am eager to test but unfortunately I will not be able to test it before 28.11.
@akleber Sure, no rush. For what it's worth, I also just submitted #248 which hopefully is an effective approach to tackling #239 :-)
Before this changes I wasn't able to compile our project due to the timeout. I played around with CLCACHE_OBJECT_CACHE_TIMEOUT_MS but it didn't help at all. With this changes everything works fine and I've got 4x build speedup :-). Thank you so much!
I can also confirm good results with this and the latest master for both #155 and #229 https://ci.appveyor.com/project/sasobadovinac/freecad/build/1.0.377
@frerich This PR solves my issue reported in #229. It did not improve the overall build time by a significant amount, but it gets rid of the time out. So a big thumbs up for this PR. Thanks. Looking into #248 now...
@frerich I am sorry to report that I messed up my tests with this PR. I by error did not use a clcache version with this change. Sorry about that. I re-run my tests and still get a timeout. However this PR improves my situation as my build passes with a timeout of "only" 20s instead of the 60s I previously had to use. I did not have time to analyze what the timeout situation is yet, but I will look into it.
@akleber No worries, it's of course perfectly possible that there are still other scenarios in which timeouts are triggered unexpectedly. The reports from you and @TiloW in #229 suggest that. I think there is some consensus that this PR implemented a more gentle locking strategy, so it moves things into the right direction. For other issues, we can create separate issues.
This PR attempts to address the issue observed in issue #229 . The theory is that lock timeouts were triggered because -- in direct mode -- clcache holds two locks while invoking the real compiler, one manifest section and one artifact section lock. So despite the fact that there are 256 of each category, if a compiler execution takes sufficiently long, it's very well possible that a concurrent clcache invocation tries to access the same manifest or the same artifact section -- but times out.
The patch supplied by this PR attempts to address this by implementing a more optimistic locking strategy: in case a cache miss is detected, clcache releases all locks, rebuilds, and then acquires all locks again and checks whether the cache still needs updating. This means that there's a miniscule chance that clcache will compile the same object twice (in case two concurrent invocations are exactly equivalent), but it probably optimizes the common code of concurrent invocations being independent.
This behaviour implemented by basically inlining a lot of code in
processDirect
andprocessNoDirect
and then reshuffling things such that the locks are not acquired while calling the real compiler.