frerich / clcache

A compiler cache for MSVC, much like ccache for gcc
Other
325 stars 83 forks source link

Replace custom atomicWrite by more robust library call. #334

Closed benjaminfuchs closed 5 years ago

benjaminfuchs commented 5 years ago

Issue

We are sometimes seeing following exception when we run our build with many cores:

Failed to execute script clcache_main
         Traceback (most recent call last):
           File "clcache_main.py", line 2, in <module>
           File "clcache\__main__.py", line 1596, in main
           File "clcache\__main__.py", line 1623, in processCompileRequest
           File "clcache\__main__.py", line 1675, in scheduleJobs
           File "concurrent\futures\_base.py", line 398, in result
           File "concurrent\futures\_base.py", line 357, in __get_result
           File "concurrent\futures\thread.py", line 55, in run
           File "clcache\__main__.py", line 1696, in processSingleSource
           File "clcache\__main__.py", line 1760, in processDirect
           File "clcache\__main__.py", line 1785, in ensureArtifactsExist
           File "clcache\__main__.py", line 756, in __exit__
           File "clcache\__main__.py", line 666, in save
           File "contextlib.py", line 66, in __exit__
           File "clcache\__main__.py", line 125, in atomicWrite
         PermissionError: [WinError 5] Access is denied: 'XXXX\\stats.txt.new' -> 'XXXX\\stats.txt'

Searching a bit makes it look like there is a race condition:

Anyway, the os.replace docs could state in general that: "[o]n Windows, a PermissionError will be raised if dst is a read-only file, or if either src or dst is currently open, or if src is a directory with an open file". https://bugs.python.org/issue27886

Looking with that in mind at the atomicWrite function:

@contextlib.contextmanager
def atomicWrite(fileName):
    tempFileName = fileName + '.new'
    with open(tempFileName, 'w') as f:
# One thread can have a open file handler on the tempFile thats always "stats.txt" ...
        yield f
    os.replace(tempFileName, fileName)
# ... while another thread is calling os.replace() causing PermssionError.

Solution

Replacing the atomicWrite function with atomic_write from the atomicwrites package seems to solve the issue (https://pypi.org/project/atomicwrites/). The package takes care of such race conditions so we no longer have to think about it.

frerich commented 5 years ago

The atomicwrites package is a great find, I didn't even know that exists!

The patch seems plausible to me, merging this. Thanks!

izmmisha commented 5 years ago

Hi, just checked this patch with this simple test:

import concurrent.futures
import tempfile
import os

#os.environ['TRACKER_ENABLED']="1"

from clcache.__main__ import (
    Statistics,
)

STATS_FILE = 'C:\\Users\\im\\clcache\\stats.txt' #problem
#STATS_FILE = tempfile.NamedTemporaryFile(dir='C:\\Users\\im\\clcache', suffix='.txt').name #problem
#STATS_FILE = tempfile.NamedTemporaryFile(suffix='.txt').name #no problem
#STATS_FILE = tempfile.NamedTemporaryFile(dir='C:\\Users\\im\\clcache').name #problem without txt extension too
#STATS_FILE = 'C:\\clcache\\stats.txt' #no problem

def change_stats():
    stats = Statistics(STATS_FILE)
    with stats.lock, stats as s:
        s.registerCallWithInvalidArgument()

if __name__ == '__main__':
    print(STATS_FILE)
    while True:
        with concurrent.futures.ThreadPoolExecutor(4) as executor:
            futures = [executor.submit(change_stats) for _ in range(1000)]
            for future in concurrent.futures.as_completed(futures):
                future.result()

Without this fix:

C:\Users\im\clcache\stats.txt
Traceback (most recent call last):
  File "..\awtest.py", line 29, in <module>
    future.result()
  File "C:\Python34\lib\concurrent\futures\_base.py", line 395, in result
    return self.__get_result()
  File "C:\Python34\lib\concurrent\futures\_base.py", line 354, in __get_result
    raise self._exception
  File "C:\Python34\lib\concurrent\futures\thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "..\awtest.py", line 20, in change_stats
    s.registerCallWithInvalidArgument()
  File "C:\Python34\lib\site-packages\clcache\__main__.py", line 795, in __exit__
    self._stats.save()
  File "C:\Python34\lib\site-packages\clcache\__main__.py", line 703, in save
    json.dump(self._dict, f, sort_keys=True, indent=4)
  File "C:\Python34\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Python34\lib\site-packages\clcache\__main__.py", line 162, in atomicWrite
    os.replace(tempFileName, fileName)
PermissionError: [WinError 5] Отказано в доступе: 'C:\\Users\\im\\clcache\\stats.txt.new' -> 'C:\\Users\\im\\clcache\\stats.txt'

With this fix:

C:\repository\git\clcache>\Python34\python.exe ..\awtest.py
C:\Users\im\clcache\stats.txt
Traceback (most recent call last):
  File "..\awtest.py", line 29, in <module>
    future.result()
  File "C:\Python34\lib\concurrent\futures\_base.py", line 395, in result
    return self.__get_result()
  File "C:\Python34\lib\concurrent\futures\_base.py", line 354, in __get_result
    raise self._exception
  File "C:\Python34\lib\concurrent\futures\thread.py", line 54, in run
    result = self.fn(*self.args, **self.kwargs)
  File "..\awtest.py", line 20, in change_stats
    s.registerCallWithInvalidArgument()
  File "C:\Python34\lib\site-packages\clcache\__main__.py", line 749, in __exit__
    self._stats.save()
  File "C:\Python34\lib\site-packages\clcache\__main__.py", line 659, in save
    json.dump(self._dict, f, sort_keys=True, indent=4)
  File "C:\Python34\lib\contextlib.py", line 66, in __exit__
    next(self.gen)
  File "C:\Python34\lib\site-packages\atomicwrites\__init__.py", line 152, in _open
    self.commit(f)
  File "C:\Python34\lib\site-packages\atomicwrites\__init__.py", line 177, in commit
    replace_atomic(f.name, self._path)
  File "C:\Python34\lib\site-packages\atomicwrites\__init__.py", line 89, in replace_atomic
    return _replace_atomic(src, dst)
  File "C:\Python34\lib\site-packages\atomicwrites\__init__.py", line 71, in _replace_atomic
    _windows_default_flags | _MOVEFILE_REPLACE_EXISTING
  File "C:\Python34\lib\site-packages\atomicwrites\__init__.py", line 66, in _handle_errors
    raise WinError()
PermissionError: [WinError 5] Отказано в доступе.
benjaminfuchs commented 5 years ago

@izmmisha Thanks for the test. I did some test runs with it. Both implementations with 200.000 calls to change_stats():

old implementation: 10 fails
new implementation: 2 fails

Did you also see such kind of frequency of fails?

I think the race condition that tempFileName was always stats.txt.new for every thread is solved. But there is still a race condition if 2 threads try to write the new content to stats.txt. I think this can only be solved by some kind of mutex guarding (what is aleardy done in some other parts of clcache).

izmmisha commented 5 years ago

It need more investigation, the problem exists only in user home directory (except temp directory), the simplest solution right now is move clcache folder out of user home directory.

benjaminfuchs commented 5 years ago

@izmmisha Looking at this post https://github.com/nodejs/node-gyp/issues/1124 maybe the default CLCACHE_DIR location should be in %LOCALAPPDATA% on windows? I ran the test with this folder and didn't get any errors. No idea what magic happens in %USERPROFILE%.

punitchandra commented 4 years ago

When can this be released?