frerich / clcache

A compiler cache for MSVC, much like ccache for gcc
Other
325 stars 84 forks source link

Incremental builds with MSBuild/Visual Studio are broken #227

Open Un1oR opened 8 years ago

Un1oR commented 8 years ago

I think it problem related with #33, but a bit more complicated. When I try incremental build, all solution fully rebuilds. I found it not happens if I comment body of PersistentJSONDict.save function. As I understand, MSBuild tracks writing a statistics file and mark projects in solution as outdated. Affects only solutions with two or more projects with build dependencies between it!

frerich commented 8 years ago

Can you provide a sample project which triggers this? On first sight, it sounds like another incarnation of https://github.com/frerich/clcache/wiki/Caveats#no-cache-hits-when-building-via-visual-studio-ide-or-msbuild -- i.e. the file tracking feature needs to be disabled. Does that help with incremental builds, too? If so, I guess that Wiki page should be extended.

Un1oR commented 8 years ago

Disabling tracking is incompatible with incremental builds, as far I'm understand. I'll add my test project a little bit later.

Un1oR commented 8 years ago

https://github.com/Un1oR/clcache-incremental-test Interesting thing I found. It affects only hot cache case. If I run build.bat first three times I haven't rebuilds at result. But if I remove build directories and run it againt, I will always have full rebuild. My log https://gist.github.com/Un1oR/81062c4a6e04ca3ec072ff986e0718e8.

Un1oR commented 8 years ago

In addition to my previous comment. If I comment registerCacheHit function body a problem goes away.

frerich commented 8 years ago

I suspect what happens is that due to the file tracing feature of Visual Studio, it notices that for any .cpp file being compiled, the stats.txt file is touched as well. So it considers that a 'dependency' of some sort: if a.cpp is touched, stats.txt needs to get touched -- and maybe Visual Studio also has a backling such that it thinks that if stats.txt is touched, a.cpp needs to be updated, too.

Thus, when doing an incremental build which merely compiles single .cpp files, it notices that doing so touches stats.txt (to count a cache hit) and thus it thinks that all other .cpp files will need to be recompiled as well? Sounnds unlikely...

I skimmed the documentation about Incremental Builds, but didn't spot anything which might help with avoiding this issue. My feeling is that if we just had a way to either disable tracking writes to stats.txt, this issue would be gone.

Un1oR commented 8 years ago

I have simplified example. Now it contains one project with two cpp files and problem still is reproduced.

I have tried to find workaround for disabling statistics file tracking. ExcludedInputPaths and TrackedOutputFilesToIgnore options wasn't have effect for me. Put cache directory into one of this folders is works, but it isn't useful solution, I think.

frerich commented 8 years ago

TrackedOutputFilesToIgnore sounds promising! I didn't see it mentioned too often though. It is listed in the Microsoft.Build.CommonTypes.xsd schema, but I see no mentioning of the term TrackedOutputFilesToIgnore in the C# source code of MSBuild. :-/

frerich commented 8 years ago

I think we should really start looking into integration tests which use clcache the same way Visual Studio does, maybe using msbuild instead of calling cl.exe triggers this already?

ivaigult commented 7 years ago

I'm observing the same behavior and even worse: compiled binaries don't contain my changes, but full rebuild helps.

frerich commented 7 years ago

Alas I don't use msbuild myself -- is it possible to detect incremental builds by looking at the cl command line? If so, we could at least treat such invocations as unsupported and don't try to restore any cached objects.

ivaigult commented 7 years ago

The command line is exactly same as for regular build, but I cannot reproduce this without Visual Studio or msbuild.

izmmisha commented 6 years ago

Would be nice if this problem solved by https://github.com/frerich/clcache/pull/319 PR.

Un1oR commented 6 years ago

I'll try to check it out soon!

RomanYudintsev commented 6 years ago

Hi, I compiled exe from https://github.com/izmmisha/clcache/tree/add-tracker-support and try to compile project https://github.com/Un1oR/clcache-incremental-test with it. Got fail https://gist.github.com/RomanYudintsev/9978f7dadcf090bb010f257e42837776#file-gistfile1-txt

izmmisha commented 6 years ago

@RomanYudintsev yeah, I saw such errors before.

cl : Command line warning D9002: ignoring unknown option '--multiprocessing-fork' [D:\test\clcache-incremental-test\app
\app.vcxproj]
cl : Command line warning D9024: unrecognized source file type 'parent_pid=9860', object file assumed [D:\test\clcache-
incremental-test\app\app.vcxproj]
cl : Command line warning D9024: unrecognized source file type 'pipe_handle=492', object file assumed [D:\test\clcache-
incremental-test\app\app.vcxproj]

This problem related to freezing of app. There is multiprocessing.freeze_support() to solve this problem, but looks like it not works in your case.

I'm using PyInstaller to create exe and it works. How you generate exe? Could you please show output of this process, probably it can show any error with multiprocessing module.

izmmisha commented 6 years ago

@RomanYudintsev also incremental build will not work with clcache_branch.exe executable, tracker will create clcache_branch*.tlog files, but msbuild will read only cl.*.tlog files, so you need to rename it to cl.exe or cl.cache_branch.exe so cl.*.exe

RomanYudintsev commented 6 years ago

build log for my clcache_branch.exe - https://gist.github.com/RomanYudintsev/4cd8657a0d682839bbceef128e9624b0 I'll try rename file

izmmisha commented 6 years ago

Ok, I see that I'm building exe in different way (I'm new to this project), should be fixed at https://github.com/izmmisha/clcache/commit/137a3e20003798b973b19a17f25e077247dc4617

Thank you @RomanYudintsev

RomanYudintsev commented 6 years ago

It's works! and not full rebuild :) Thanks @izmmisha

izmmisha commented 6 years ago

After some time of inspecting msbuild I found another things. v120 toolchain (VS2013) use hardcoded tlog names

    protected override string[] ReadTLogNames
    {
      get
      {
        return new string[2]
        {
          "cl.read.1.tlog",
          "cl.*.read.1.tlog"
        };
      }
    }

v140 toolchain use CLToolExe for filenames

    protected override string[] ReadTLogNames
    {
      get
      {
        string withoutExtension = Path.GetFileNameWithoutExtension(this.get_ToolExe());
        return new string[3]
        {
          withoutExtension + ".read.*.tlog",
          withoutExtension + ".*.read.*.tlog",
          withoutExtension + "-*.read.*.tlog"
        };
      }
    }

So @RomanYudintsev I see that your linker version is 14, probably you are using v140 toolchain. With v140 toolchain we don't need to rename clcache to cl.*, but it inspecting logs from all threads (not only main thread like v120). This mean that proposed solution (at https://github.com/frerich/clcache/pull/319 PR) with moving read stats.txt from main thread will not work.

But as mentioned earlier msbuild have some properties to set ignore files:

TrackedOutputFilesToIgnore sounds promising! I didn't see it mentioned too often though. It is listed in the Microsoft.Build.CommonTypes.xsd schema, but I see no mentioning of the term TrackedOutputFilesToIgnore in the C# source code of MSBuild. :-/

In case of stats.txt and manifests it is possible to setup TrackedInputFilesToIgnore using NoDependencies property, it can be set via msbuild command line option /p:NoDependencies=c:\path\to\clcache\* or via environment variable set NoDependencies=c:\path\to\clcache\*.

So probably with v140 toolchain you can use NoDependencies without clcache modification at all. v120 toolchain require to read of source files from main thread.

izmmisha commented 6 years ago

Ok, finally I found correct way how to make it work. FileTracker.dll exports API to suspend/resume tracking operations. PR https://github.com/frerich/clcache/pull/319 updated. It should works with v120, v140 toolchains. But it still necessary to froze clcache to cl.exe when v120 toolchain used. I don't think that this is a problem, just use ClToolPath=c:\path\to\clcache\ instead ClToolExe=clcache.exe v140 toolchain works with clcache.exe

I think we should really start looking into integration tests which use clcache the same way Visual Studio does

@frerich, Added some tests at https://github.com/frerich/clcache/pull/319/commits/408d609e7705ba00898da9a09491e9c8c89ae835 https://github.com/frerich/clcache/pull/319/commits/653ac12c3b89338e4f25fe9df8bca45d7cfbd693 (fixed v140 tests)

pthom commented 5 years ago

@izmmisha : I tested your fork on a large project (600 000 LoC, 173 projects).

It works almost perfectly! Thanks a lot! The compilation time dropped from 30 minutes to 3 minutes, and incremental builds now work flawlessly under MSVC 2015 and 2017 (and it integrates well with Qt moc).

I wrote a script to simplify the integration with Visual Studio : you can find it here:

However I have one small issue : sometimes one invocation of cl.exewill fail with the message

29>  PermissionError: [WinError 5] Access is denied: 'C:\\Users\\pascal\\clcache\\stats.txt.new' -> 'C:\\Users\\pascal\\clcache\\stats.txt'

This is extremey rare : the probability is about 0.1%, so it happens only with large projects with a large number of hits. Using clcache-server does not solve it.

However when I encounter, I just need to do a second build, and it always work (and build only the missing files).

Do you have an idea on how to solve this problem?

Here is the full trace when I have this issue:

29>  C:\Python36-32\Scripts\clcache.py Reusing cached object for key e6d9192be7989a90ce13033623593150 for object file ExternalCppUtils_GTest.dir\Release\CircularAccess_GTest.obj
29>  C:\Python36-32\Scripts\clcache.py Finished. Exit code 0
29>  concurrent.futures.process._RemoteTraceback:
29>  """
29>  Traceback (most recent call last):
29>    File "c:\python36-32\lib\concurrent\futures\process.py", line 175, in _process_worker
29>      r = call_item.fn(*call_item.args, **call_item.kwargs)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1725, in processSingleSource
29>      return processDirect(cache, objectFile, compiler, cmdLine, sourceFile)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1755, in processDirect
29>      return processCacheHit(cache, objectFile, cachekey)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1537, in processCacheHit
29>      stats.registerCacheHit()
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 153, in untrackedFunc
29>      return func(*args, **kwargs)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 795, in __exit__
29>      self._stats.save()
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 703, in save
29>      json.dump(self._dict, f, sort_keys=True, indent=4)
29>    File "c:\python36-32\lib\contextlib.py", line 88, in __exit__
29>      next(self.gen)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 162, in atomicWrite
29>      os.replace(tempFileName, fileName)
29>  PermissionError: [WinError 5] Access is denied: 'C:\\Users\\pascal\\clcache\\stats.txt.new' -> 'C:\\Users\\pascal\\clcache\\stats.txt'
29>  """
29>
29>  The above exception was the direct cause of the following exception:
29>
29>  Traceback (most recent call last):
29>    File "C:\Python36-32\Scripts\clcache-script.py", line 11, in <module>
29>      load_entry_point('clcache==4.1.1.dev65+gc4a6f5e', 'console_scripts', 'clcache')()
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1619, in main
29>      return processCompileRequest(cache, compiler, sys.argv)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1646, in processCompileRequest
29>      return scheduleJobs(cache, compiler, cmdLine, environment, sourceFiles, objectFiles)
29>    File "c:\python36-32\lib\site-packages\clcache\__main__.py", line 1704, in scheduleJobs
29>      exitCode, out, err, doCleanup = future.result()
29>    File "c:\python36-32\lib\concurrent\futures\_base.py", line 425, in result
29>      return self.__get_result()
29>    File "c:\python36-32\lib\concurrent\futures\_base.py", line 384, in __get_result
29>      raise self._exception
29>  PermissionError: [WinError None] Access is denied: 'C:\\Users\\pascal\\clcache\\stats.txt.new' -> 'C:\\Users\\pascal\\clcache\\stats.txt'
izmmisha commented 5 years ago

It works almost perfectly! Thanks a lot!

Glad to hear, thanks.

Do you have an idea on how to solve this problem?

I had such problem on my side. It related to windows explorer, you are keep open window with C:\Users\pascal\clcache\ folder. I don't know how to solve it, except close explorer... (probably any other software which can watch folder content can cause such problem too)

pthom commented 5 years ago

Thanks for your answer! Actually I had no explorer window opened on this folder.

So, I posted a workaround for the stats.txt issue : it is here https://github.com/pthom/clcache/commit/78a4630ed3730d9d6f93133656df7139c2d3e3ef

On my side, it works perfectly.

Do you want to add this patch to your PR? Or do you prefer that I post a separate PR?

Thanks!

izmmisha commented 5 years ago

Do you want to add this patch to your PR? Or do you prefer that I post a separate PR?

Well, would be better to identify the root of problem, so probably it introduced when used ProcessPoolExecutor, but I have only MSVS 2013 toolchain and it is necessary to make it work.

But if you are able to reproduce this problem on your side and it looks like you are using MSVS 2017 (or 2015), you can try revert this commit https://github.com/frerich/clcache/pull/319/commits/1aebba37d534db37ac78b45b4d36cdb71251f8e2 then we will know is ProcessPoolExecutor related to this problem or not.

pthom commented 5 years ago

Hi, I tried to revert your commit (and my modif), and the problem is still present. Your commit is probably out of cause and I think this problem is older than your PR.

I suspect the issue revolves around the locking mechanism inside the Statistics class. It instantiates a lock. However as far as I understand, it is never used.

So I tried to use the lock inside the __exit__ method as below :

    @untrackable
    def __exit__(self, typ, value, traceback):
        with self.lock:
            # Does not write to disc when unchanged
            self._stats.save()

But it did not solve the problem either.

I think the issue is difficult to solve by code :

See the class CacheLock : it tries to group a standard mutex together with a global windows mutex. This is quite tricky, and might fail sometimes.

class CacheLock:
    def __init__(self, mutexName, timeoutMs):
        self._mutexName = 'Local\\' + mutexName
        self._mutex = None
        self._timeoutMs = timeoutMs
    def acquire(self):
        if not self._mutex:
            self.createMutex()
        result = windll.kernel32.WaitForSingleObject(
            self._mutex, wintypes.INT(self._timeoutMs))
        if result not in [0, self.WAIT_ABANDONED_CODE]:
            if result == self.WAIT_TIMEOUT_CODE:
pthom commented 5 years ago

On a side note, I have discovered that this PR only works with python 32 bits.

This is due to the fact that MSVC and msbuild is itself a 32 bits program, and this PR needs to communicate with msbuild via a 32 bits dll (FileTracker.dll) see commit: https://github.com/frerich/clcache/pull/319/commits/2e13e112e886b5c86da2f6dd1182e1452f7242bb

izmmisha commented 5 years ago

@pthom Well, I did some tests using this simple script:

import concurrent.futures
import tempfile
import os

#os.environ['TRACKER_ENABLED']="1"

from clcache.__main__ import (
    Statistics,
)

#STATS_FILE = 'C:\\Users\\im\\clcache\\stats.txt' #problem
#STATS_FILE = tempfile.NamedTemporaryFile(dir='C:\\Users\\im\\clcache', suffix='.txt').name #problem
#STATS_FILE = tempfile.NamedTemporaryFile(suffix='.txt').name #no problem
#STATS_FILE = tempfile.NamedTemporaryFile(dir='C:\\Users\\im\\clcache').name #problem without txt extension too
STATS_FILE = 'C:\\clcache\\stats.txt' #no problem

def change_stats():
    stats = Statistics(STATS_FILE)
    with stats.lock, stats as s:
        s.registerCallWithInvalidArgument()

if __name__ == '__main__':
    print(STATS_FILE)
    while True:
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            futures = [executor.submit(change_stats) for _ in range(1000)]
            for future in concurrent.futures.as_completed(futures):
                future.result()

So, I'm able to reproduce this problem with ThreadPoolExecutor and without untracker's decorators:

C:\Users\im\clcache\tmpp4e3tv5w
Traceback (most recent call last):
  File "stattest.py", line 28, in <module>
    future.result()
  File "c:\Python34\lib\concurrent\futures\_base.py", line 395, in result
    return self.__get_result()
  File "c:\Python34\lib\concurrent\futures\_base.py", line 354, in __get_result
    raise self._exception
  File "c:\Python34\lib\concurrent\futures\thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "stattest.py", line 19, in change_stats
    s.registerCallWithInvalidArgument()
  File "c:\Python34\lib\site-packages\clcache\__main__.py", line 809, in __exit__
    self._stats.save()
  File "c:\Python34\lib\site-packages\clcache\__main__.py", line 719, in save
    json.dump(self._dict, f, sort_keys=True, indent=4)
  File "c:\Python34\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "c:\Python34\lib\site-packages\clcache\__main__.py", line 162, in atomicWrite
    os.replace(tempFileName, fileName)
PermissionError: [WinError 5] Отказано в доступе: 'C:\\Users\\im\\clcache\\tmpp4e3tv5w.new' -> 'C:\\Users\\im\\clcache\\tmpp4e3tv5w'

The thing I get from this test is that I'm able to easily reproduce such problem only when file located at %USERPROFILE%\clcache, if it located at %USERPROFILE%\AppData\Local\Temp there is no error, if it located at C:\clcache\ also no problem (and as you mentioned before windows explorer is not related to this).

So probably this problem should be added to caveat section of wiki.

9a4gl commented 5 years ago

On a side note, I have discovered that this PR only works with python 32 bits.

This is due to the fact that MSVC and msbuild is itself a 32 bits program, and this PR needs to communicate with msbuild via a 32 bits dll (FileTracker.dll) see commit: 2e13e11

I am using x86 toolchains and x64 python, and this commit helped in my case: https://github.com/9a4gl/clcache/commit/dd15bd8f6faeebd8a4d3a0a677737bd07812fe5b