godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
90.71k stars 21.12k forks source link

On Windows at least building with a -j > 1 scons parameter fails with a file not found even though the file exists #5042

Closed corekase closed 6 years ago

corekase commented 8 years ago

Operating system or device - Godot version:

Windows 10 Pro 64-bit, Godot source 64-bit build (tested with master branch)

Visual Studio 2015 Community, Windows 10 SDK Kit 10586.

Python 2.7.11, Scons 2.5.0, and Pywin32 220 all these tools are 32-bit.

Issue description (what happened, and what was expected):

When I issue the command:

scons platform=windows target=release_debug -jx

Where "x" is a number > 1 then the build process will eventually give a file not found error. Going to where it says the file is not found it is present. Specifying -j1 on the build line works fine.

Steps to reproduce:

scons platform=windows target=release_debug -j2

Grass-H0PEr commented 7 years ago

Windows 7 compiling Godot from main github source [on 2017-Oct-5th].

Same issue with scons -j# platform=windows failing to build most of the time.

The only fatal error I kept getting was for "certs_compressed.gen.h" file ( which does exist ):

editor\editor_initialize_ssl.cpp(32): fatal error C1083: Cannot open include file: 'certs_compressed.gen.h': No such file or directory

Multiple fails to build Godot, each time that was the only fatal error I got.

Posting here for reference in case it helps.

mhilbrunner commented 6 years ago

I'm getting these errors 90% of the time with -j >1 on Scons 3.0. enough so that I always use a single thread to compile on Windows, which isn't optimal. :/

akien-mga commented 6 years ago

I'm getting these errors 90% of the time with -j >1 on Scons 3.0. enough so that I always use a single thread to compile on Windows, which isn't optimal. :/

These errors only happen when the files aren't generated yet. If you've built once with -j1, you can build with several threads afterwards just fine. Unless you remove all your build objects before each build of course.

viktor-ferenczi commented 6 years ago

Tested with latest master [aeb1c67b] and scons -j16 platform=windows on x64 Native console:

[Initial build] Compiling ==> editor\editor_initialize_ssl.cpp
editor_initialize_ssl.cpp
editor\editor_initialize_ssl.cpp(33): fatal error C1083: Cannot open include file: 'certs_compressed.gen.h': No such file or directory
scons: *** [editor\editor_initialize_ssl.windows.tools.64.obj] Error 2

Problem still persists, at least on 64 bit Windows.

Environment:

C:\godot>python --version
Python 2.7.14

C:\godot>scons --version
SCons by Steven Knight et al.:
        script: v3.0.0.rel_3.0.0:4395:8972f6a2f699, 2017/09/18 12:59:24, by bdbaddog on ubuntu-16
        engine: v3.0.0.rel_3.0.0:4395:8972f6a2f699, 2017/09/18 12:59:24, by bdbaddog on ubuntu-16
        engine path: ['C:\\Python27\\scons-3.0.0\\SCons']
Copyright (c) 2001 - 2017 The SCons Foundation

C:\godot>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25835 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

(Visual Studio 2017 Professional)

"Error 2" means file not found, so it looks like this is not a permission issue.

The problem also happens with -j1, but less likely.

Trying to run the build with PyCharm's debugger...

garyo commented 6 years ago

This kind of thing is usually a missing dependency: SCons doesn't know that editor\editor_initialize_ssl.cpp depends on certs_compressed.gen.h for whatever reason so it gets the build order wrong. Perhaps CPPPATH doesn't include that dir, or something (that's how SCons finds dependent headers). If you run scons --tree=prune do you see that dependency? (The first time, meaning a clean build). Maybe print out env['CPPPATH'] at the end of SConstruct also. scons ... -debug=includes can also perhaps help. That will show you what SCons thinks are the needed includes for each target.

viktor-ferenczi commented 6 years ago

Thank you for your useful hints.

Command used: scons --cache-disable --debug=prepare --debug=time --debug=explain --debug=findlibs --debug=includes --tree=prune -j16 platform=windows >build.log 2>&1

The build folder was excluded from virus scanning.

I also hacked timestamps into the stdout of scons to reveal the wall clock order of events.

CPPPATH looks all right:

2018-03-10T19:57:10.105000: CPPPATH: ['#core', '#core/math', '#editor', '#drivers', '#', '#platform/windows', '#thirdparty/zstd', '#thirdparty/zstd/common', '#thirdparty/zlib/', '#thirdparty/rtaudio/', '#thirdparty/glad/', '#thirdparty/freetype/', '#thirdparty/freetype//include', '#thirdparty/libpng', '#thirdparty/recastnavigation/Recast/', '#thirdparty/recastnavigation/Recast//Include', '#thirdparty/nanosvg/']

All four build attempts failed. The issue happened to two different generated header files:

Dependency tree was correct in all cases. Both generated headers were properly recognized as dependencies for the corresponding object file. The object file is what's built from the cpp file including the generated header, so this is good.

The wall clock order of build events were also right, nothing was attempted in the wrong order.

Only discrepancy I found is double registration of the original data file as dependency in case of generated header files:

2018-03-10T19:54:54.638000:   |   | | +-editor\certs_compressed.gen.h
2018-03-10T19:54:54.638000:   |   | | | +-thirdparty\certs\ca-certificates.crt
2018-03-10T19:54:54.638000:   |   | | | +-thirdparty\certs\ca-certificates.crt
2018-03-10T19:58:10.020000:   |   | | +-editor\license.gen.h
2018-03-10T19:58:10.020000:   |   | | | +-COPYRIGHT.txt
2018-03-10T19:58:10.020000:   |   | | | +-LICENSE.txt
2018-03-10T19:58:10.020000:   |   | | | +-COPYRIGHT.txt
2018-03-10T19:58:10.020000:   |   | | | +-LICENSE.txt
viktor-ferenczi commented 6 years ago

Ended up debugging the entire scons.py process in PyCharm. It works fine even with -j16, but decreases the probability of the issue for some reason. Tests can be sped up considerably by running the whole build on a ramdisk, like ImDisk.

Finally managed to catch the issue in action by putting a breakpoint into the error handling inside C:\Python27\scons-3.0.0\SCons\Action.py right on the branch handling the error:

    def execute(self, target, source, env, executor=None):
            ...
            result = spawn(shell, escape, cmd_line[0], cmd_line, ENV)
            if not ignore and result:
                msg = "Error %s" % result      # <<< Put breakpoint here!
                return SCons.Errors.BuildError(errstr=msg,
                                               status=result,
                                               action=self,
                                               command=cmd_line)

In-situ debug information:

os.getcwd() => 'G:\\godot'
result = 2
cmd_line = [
 'cl',
 '/Fomain\\main.windows.tools.64.obj',
 '/c',
 '"main\\main.cpp"',
 '/TP',
 '/TP',
 '/nologo',
 '/Z7',
 '/DDEBUG_ENABLED',
 '/DDEBUG_MEMORY_ENABLED',
 '/DD3D_DEBUG_INFO',
 '/Od',
 '/EHsc',
 '/MT',
 '/Gd',
 '/GR',
 '/nologo',
 '"/IC:\\Program Files (x86)\\Windows Kits\\10\\/Include"',
 '/DWINDOWS_ENABLED',
 '/DOPENGL_ENABLED',
 '/DRTAUDIO_ENABLED',
 '/DWASAPI_ENABLED',
 '/DTYPED_METHOD_BIND',
 '/DWIN32',
 '/DWINVER=0x0601',
 '/D_WIN32_WINNT=0x0601',
 '/D_WIN64',
 '"/IC:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.12.25827\\ATLMFC\\include"',
 '"/IC:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Professional\\VC\\Tools\\MSVC\\14.12.25827\\include"',
 '"/IC:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.6.1\\include\\um"',
 '"/IC:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.16299.0\\ucrt"',
 '"/IC:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.16299.0\\shared"',
 '"/IC:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.16299.0\\um"',
 '"/IC:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.16299.0\\winrt"',
 '/I',
 '/w',
 '/EHsc',
 '-DZSTD_STATIC_LINKING_ONLY',
 '-DFT2_BUILD_LIBRARY',
 '-DZLIB_DEBUG',
 '-DFREETYPE_ENABLED',
 '-DFT_CONFIG_OPTION_USE_PNG',
 '-DRECAST_ENABLED',
 '-DSVG_ENABLED',
 '-DDEBUG_MEMORY_ALLOC',
 '-DSCI_NAMESPACE',
 '/DMSVC',
 '/GR',
 '-DPTRCALL_ENABLED',
 '-DTOOLS_ENABLED',
 '-DGDSCRIPT_ENABLED',
 '-DMINIZIP_ENABLED',
 '-DXML_ENABLED',
 '-DGLAD_ENABLED',
 '-DGLES_OVER_GL',
 '/Icore',
 '/Icore\\math',
 '/Ieditor',
 '/Idrivers',
 '/I.',
 '/Iplatform\\windows',
 '/Ithirdparty\\zstd',
 '/Ithirdparty\\zstd\\common',
 '/Ithirdparty\\zlib',
 '/Ithirdparty\\rtaudio',
 '/Ithirdparty\\glad',
 '/Ithirdparty\\freetype',
 '/Ithirdparty\\freetype\\include',
 '/Ithirdparty\\libpng',
 '/Ithirdparty\\recastnavigation\\Recast',
 '/Ithirdparty\\recastnavigation\\Recast\\Include',
 '/Ithirdparty\\nanosvg'
]

Include folders only:

['/I',
 '/Icore',
 '/Icore\\math',
 '/Ieditor',
 '/Idrivers',
 '/I.',
 '/Iplatform\\windows',
 '/Ithirdparty\\zstd',
 '/Ithirdparty\\zstd\\common',
 '/Ithirdparty\\zlib',
 '/Ithirdparty\\rtaudio',
 '/Ithirdparty\\glad',
 '/Ithirdparty\\freetype',
 '/Ithirdparty\\freetype\\include',
 '/Ithirdparty\\libpng',
 '/Ithirdparty\\recastnavigation\\Recast',
 '/Ithirdparty\\recastnavigation\\Recast\\Include',
 '/Ithirdparty\\nanosvg']

Generated headers included in main.cpp:

#include "app_icon.gen.h"
#include "splash.gen.h"
#include "splash_editor.gen.h"

These files are saved into the main subdirectory. Since we build main/main.cpp, those three files are on the current directory (/I.) from the compiler's perspective (relative to the source file). So they should be found, but they are not.

I verified, all three files are there. Unfortunately I did not have the exact timestamp of the issue, so could not verify whether all required header files were there at the time the compiler was executed. That's the next step to verify.

Just to make sure, I've opened a Visual Studio x86 native command prompt in the G:\godot folder and executed the same command. It worked and built the object file with no error:

cl /Fomain\main.windows.tools.64.obj /c "main\main.cpp" /TP /TP /nologo /Z7 /DDEBUG_ENABLED /DDEBUG_MEMORY_ENABLED /DD3D_DEBUG_INFO /Od /EHsc /MT /Gd /GR /nologo "/IC:\Program Files (x86)\Windows Kits\10\/Include" /DWINDOWS_ENABLED /DOPENGL_ENABLED /DRTAUDIO_ENABLED /DWASAPI_ENABLED /DTYPED_METHOD_BIND /DWIN32 /DWINVER=0x0601 /D_WIN32_WINNT=0x0601 /D_WIN64 "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include" "/IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "/IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\ucrt" "/IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\shared" "/IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\um" "/IC:\Program Files (x86)\Windows Kits\10\include\10.0.16299.0\winrt" /I /w /EHsc -DZSTD_STATIC_LINKING_ONLY -DFT2_BUILD_LIBRARY -DZLIB_DEBUG -DFREETYPE_ENABLED -DFT_CONFIG_OPTION_USE_PNG -DRECAST_ENABLED -DSVG_ENABLED -DDEBUG_MEMORY_ALLOC -DSCI_NAMESPACE /DMSVC /GR -DPTRCALL_ENABLED -DTOOLS_ENABLED -DGDSCRIPT_ENABLED -DMINIZIP_ENABLED -DXML_ENABLED -DGLAD_ENABLED -DGLES_OVER_GL /Icore /Icore\math /Ieditor /Idrivers /I. /Iplatform\windows /Ithirdparty\zstd /Ithirdparty\zstd\common /Ithirdparty\zlib /Ithirdparty\rtaudio /Ithirdparty\glad /Ithirdparty\freetype /Ithirdparty\freetype\include /Ithirdparty\libpng /Ithirdparty\recastnavigation\Recast /Ithirdparty\recastnavigation\Recast\Include /Ithirdparty\nanosvg

viktor-ferenczi commented 6 years ago

Reproduced the same problem again, this time outside the debugger with extra logging.

Verified that all three generated header files in the main directory did exist and last modified at least 250ms before the compiler command issued.

Started the build with cleaning, then:

set PYTHONUNBUFFERED=1
scons --cache-disable --debug=prepare --debug=time --debug=explain --debug=findlibs --debug=includes --tree=prune -j16 platform=windows >build.log 2>&1

Full log

Grep in the log for thread ID: [31988]

GEN: Looks up the generated files and their st_mtime after the failure happened. You can compare them with the START TIME of the compiler command. Doing it before each command execution adds enough delay to decrease the bug probability considerably, so no repro anymore. Looks like a timing issue / race condition, indeed. TODO: Figure out the precision of NTFS modify timestamps.

Command was started:

2018-03-11T03:34:07.046000 [31988]: START TIME: 1520735647.046000

It failed with Error 2, then the debugging code I added looked for the generated headers in the project. These are the relevant ones for main.cpp which it found:

2018-03-11T03:34:07.706000 [31988]: GEN: .\main\app_icon.gen.h [mtime=1520735644.882805]
2018-03-11T03:34:07.706000 [31988]: GEN: .\main\splash.gen.h [mtime=1520735646.800466]
2018-03-11T03:34:07.706000 [31988]: GEN: .\main\splash_editor.gen.h [mtime=1520735644.547750]

File mtime 1520735646.800466 is about 250ms before the execution timestamp: 1520735647.046000

So all three files were there, but still appeared as missing for the compiler.

Executing the compiler manually with the exact same command works fine, certainly.

Conclusion: Some files are not usable for a short period of time after writing them out. Likely we have to add a workaround (on Windows) which checks for the readability of all required generated header files before invoking the compiler. Another solution would be to retry the compiler command in case of Error 2 result, but it might need modifications to scons itself.

UPDATE: One more build with two subsequent failures in it. See full logs

During the first run the unreadable file was splash_editor.gen.h, the file's timestamp was 2.5 seconds (!) earlier than the CL is started based on wall clock timestamps logged. So the file was definitely there, just unreadable for some reason. For the full picture, this time I forgot to add the ramdisk to the exclusion list of the antivirus, which seems to make things worse on Windows.

viktor-ferenczi commented 6 years ago

During the investigation I made changes to the build code (Python) to properly close all files everywhere in the Python code base by using the Python with statement for file context wherever it did not need too much refactoring. Explicit close calls are added in all other cases. Unfortunately this does not fix the original issue, but it is good practice anyway.

Please find the fixes mentioned above in the PR below.

Fixing this race condition on Windows by waiting for the dependencies to be actually readable would be the task of SCons, so the relevant fix should go into their code base.

Based on all the investigations above I suggest closing this ticket.

garyo commented 6 years ago

Great work tracking this down in so much detail, viktor-ferenczi! If you would like to go a little further down this rabbit hole, try running a build while using Procmon: https://docs.microsoft.com/en-us/sysinternals/downloads/procmon -- that should show you exactly why the file isn't openable. Maybe virus scanner has it open, or backup, or something else. Or maybe as you suggest, python isn't closing it. On Windows, only one process can have a file open at a time (in most situations).

viktor-ferenczi commented 6 years ago

Godot build log (with timestamps):

2018-03-17T11:38:35.204000 [27392]: SHELL: u'C:\\WINDOWS\\System32\\cmd.exe'Compiling ==> main\default_controller_mappings.gen.cpp
2018-03-17T11:38:35.249000 [25892]: CMD: cl /Fomain\default_controller_mappings.gen.windows.tools.64.obj /c "main\default_controller_mappings.gen.cpp" /TP /nologo /Z7 /Od /EHsc /MT /Gd /GR /w /EHsc -DZSTD_STATIC_LINKING_ONLY -DFT2_BUILD_LIBRARY -DZLIB_DEBUG -DFREETYPE_ENABLED -DFT_CONFIG_OPTION_USE_PNG -DRECAST_ENABLED -DSVG_ENABLED -DGLAD_ENABLED -DGLES_OVER_GL /DDEBUG_MEMORY_ALLOC /DSCI_NAMESPACE /DDEBUG_ENABLED /DDEBUG_MEMORY_ENABLED /DD3D_DEBUG_INFO /DWINDOWS_ENABLED /DOPENGL_ENABLED /DRTAUDIO_ENABLED /DWASAPI_ENABLED /DTYPED_METHOD_BIND /DWIN32 /DMSVC "/D{'_WIN32_WINNT': '0x0601', 'WINVER': '0x0601'}" /D_WIN64 /DPTRCALL_ENABLED /DTOOLS_ENABLED /DGDSCRIPT_ENABLED /DMINIZIP_ENABLED /DXML_ENABLED /Icore /Icore\math /Ieditor /Idrivers /I. /Iplatform\windows "/IC:\Program Files (x86)\Windows Kits\10\Include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\...
default_controller_mappings.gen.cpp
c1xx: fatal error C1083: Cannot open source file: 'main\default_controller_mappings.gen.cpp': Permission denied
2018-03-17T11:38:37.749000 [1720]:   |   | | +-main\default_controller_mappings.gen.cpp
2018-03-17T11:39:06.805000 [1720]:   | +-[main\default_controller_mappings.gen.cpp]

Process Monitor events filtered on default_controller_mappings.gen.cpp:

11:38:34.7608493    python.exe  24660   QueryOpen   G:\godot\main\default_controller_mappings.gen.cpp   NAME NOT FOUND  
11:38:34.7727775    python.exe  24660   QueryOpen   G:\godot\main\default_controller_mappings.gen.cpp   NAME NOT FOUND  
11:38:34.8151715    python.exe  24660   CreateFile  G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
11:38:34.8153579    python.exe  24660   QueryInformationVolume  G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:34.8153733    python.exe  24660   QueryAllInformationFile G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Write, Read Attributes, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:34.8234762    python.exe  24660   WriteFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 0, Length: 4,111, Priority: Normal
...
11:38:34.8995967    python.exe  24660   WriteFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 258,846, Length: 4,111, Priority: Normal
11:38:34.8996605    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   END OF FILE Offset: 262,144, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
11:38:34.9006556    python.exe  24660   WriteFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 262,957, Length: 4,113
...
11:38:34.9123225    python.exe  24660   WriteFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 320,515, Length: 1,875
11:38:34.9716933    python.exe  24660   QueryOpen   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, AllocationSize: 327,680, EndOfFile: 322,390, FileAttributes: A
11:38:34.9718483    python.exe  24660   QueryOpen   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, AllocationSize: 327,680, EndOfFile: 322,390, FileAttributes: A
11:38:34.9721116    python.exe  24660   CreateFile  G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
11:38:34.9721715    python.exe  24660   QueryInformationVolume  G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:34.9721837    python.exe  24660   QueryAllInformationFile G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:34.9722100    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 0, Length: 65,536, Priority: Normal
11:38:34.9723989    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 65,536, Length: 65,536
11:38:34.9725565    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 131,072, Length: 65,536
11:38:34.9727488    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 196,608, Length: 65,536
11:38:34.9729153    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 262,144, Length: 60,246
11:38:34.9729327    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   END OF FILE Offset: 322,390, Length: 4,096
11:38:34.9730683    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   END OF FILE Offset: 322,390, Length: 65,536
11:38:34.9731062    python.exe  24660   CloseFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS 
11:38:35.1611191    python.exe  24660   CreateFile  G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
11:38:35.1612588    python.exe  24660   QueryInformationVolume  G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:35.1612716    python.exe  24660   QueryAllInformationFile G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:35.1612898    python.exe  24660   QueryInformationVolume  G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramৎ
11:38:35.1612972    python.exe  24660   QueryAllInformationFile G:\godot\main\default_controller_mappings.gen.cpp   BUFFER OVERFLOW CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:34, ChangeTime: 17/03/2018 11:38:34, FileAttributes: A, AllocationSize: 327,680, EndOfFile: 322,390, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000000001a70, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
11:38:35.1613175    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 0, Length: 319,488, Priority: Normal
11:38:35.1613876    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Offset: 319,488, Length: 2,902
11:38:35.1614067    python.exe  24660   ReadFile    G:\godot\main\default_controller_mappings.gen.cpp   END OF FILE Offset: 322,390, Length: 4,096
11:38:35.1614622    python.exe  24660   CloseFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS 
11:38:35.3236414    cl.exe  28964   QueryDirectory  G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS Filter: default_controller_mappings.gen.cpp, 1: default_controller_mappings.gen.cpp
11:38:35.3237172    cl.exe  28964   CreateFile  G:\godot\main\default_controller_mappings.gen.cpp   SHARING VIOLATION   Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a
11:38:35.5890745    cmd.exe 24148   CloseFile   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS 
11:38:35.6872005    python.exe  24660   QueryOpen   G:\godot\main\default_controller_mappings.gen.cpp   SUCCESS CreationTime: 17/03/2018 11:38:34, LastAccessTime: 17/03/2018 11:38:34, LastWriteTime: 17/03/2018 11:38:35, ChangeTime: 17/03/2018 11:38:35, AllocationSize: 323,584, EndOfFile: 322,390, FileAttributes: A

G: is a ramdisk with no backup. G:\godot build folder was excluded from all Bitdefender Antivirus scanning, it was verified by seeing the lack of access from its vsserv.exe process.

BUFFER OVERFLOW is normal and harmless. Some repetitive WriteFile operations were removed for clarity. The very last QueryOpen comes from my extra check for the readability of the file.

Next steps:

garyo commented 6 years ago

Sharing violation just means that by default, only one process can have a file open on Windows. Python still has it open (for reading), and the compiler is trying to open it. (https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858(v=vs.85).aspx may help.)

So the question is why Python (SCons) still has it open. I know SCons has had this kind of problem in the past; here's a (very!) old discussion of something quite similar: http://scons-users.scons.narkive.com/pjMTzf5v/race-condition-in-scons-scheduler - you can see it could be a number of things, from python not closing the file until it collects garbage, to a bug in the python function that's creating default_controller_mappings.gen.cpp (could you print a timestamp when that finishes, to correlate with the procmon log?), to a problem with the include-file scanner. Here's another old report of a similar thing: http://scons.1086193.n5.nabble.com/MSVC-Parallell-Builds-td23950.html

So if with your help we could nail this one, it would be amazing. I'm going to mention this on the SCons irc as well.

One other question: I know python made significant changes to the IO system in 3.x. I wonder if this problem would go away if you run SCons with python3? Oh, in any case you should use SCons 3.0.1. It's still compatible with Python 2.7.

Calinou commented 6 years ago

One other question: I know python made significant changes to the IO system in 3.x. I wonder if this problem would go away if you run SCons with python3? Oh, in any case you should use SCons 3.0.1. It's still compatible with Python 2.7.

I'm still running into this issue with Python 3.6.4 and SCons 3.0.1 on Windows 10.

viktor-ferenczi commented 6 years ago

@garyo

python not closing the file until it collects garbage

Fix for that was in my PR, thanks for merging it. I use latest master for my tests.

One other question: I know python made significant changes to the IO system in 3.x.

Python 2.7.14 is used in my tests to avoid any such issues. I think it is best to separate the problems as much as possible. When we nail this with Python 2.7, then I will give it a test on latest 3.x as well.

viktor-ferenczi commented 6 years ago

This time it happened with: splash.gen.h

Build log (with scons log timestamp hack):

2018-03-17T14:19:33.240000 [43524]: scons: building `main\splash.gen.h' because it doesn't exist
2018-03-17T14:19:33.240000 [43524]: Preparing target main\splash.gen.h...
2018-03-17T14:19:33.241000 [38480]: make_splash(["main\splash.gen.h"], ["main\splash.png"])
2018-03-17T14:19:34.264000 [38480]: Command execution time: main\splash.gen.h: 1.023000 seconds
main\main.cpp(45): fatal error C1083: Cannot open include file: 'splash.gen.h': No such file or directory

Process monitor events:

14:19:32.3982270    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  NAME NOT FOUND  
14:19:32.3985416    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  NAME NOT FOUND  
14:19:32.6686124    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  NAME NOT FOUND  
14:19:33.2427370    python.exe  52576   CreateFile  G:\godot\main\splash.gen.h  SUCCESS Desired Access: Generic Write, Read Attributes, Disposition: OverwriteIf, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: 0, OpenResult: Created
14:19:33.2429277    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 0, EndOfFile: 0, FileAttributes: A
14:19:33.2429297    python.exe  52576   QueryInformationVolume  G:\godot\main\splash.gen.h  BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram8胆뷛
14:19:33.2429414    python.exe  52576   QueryAllInformationFile G:\godot\main\splash.gen.h  BUFFER OVERFLOW CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 0, EndOfFile: 0, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Write, Read Attributes, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:33.2431568    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 0, EndOfFile: 0, FileAttributes: A
14:19:33.3142694    python.exe  52576   WriteFile   G:\godot\main\splash.gen.h  SUCCESS Offset: 0, Length: 5,119, Priority: Normal
14:19:33.3143777    python.exe  52576   WriteFile   G:\godot\main\splash.gen.h  SUCCESS Offset: 5,119, Length: 99
14:19:33.3690732    python.exe  52576   WriteFile   G:\godot\main\splash.gen.h  SUCCESS Offset: 5,218, Length: 4,992, Priority: Normal
14:19:33.3930376    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 16,384, EndOfFile: 10,210, FileAttributes: A
14:19:33.4052584    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 16,384, EndOfFile: 10,210, FileAttributes: A
14:19:33.4236712    python.exe  52576   WriteFile   G:\godot\main\splash.gen.h  SUCCESS Offset: 10,210, Length: 4,999
...
14:19:34.2640760    python.exe  52576   WriteFile   G:\godot\main\splash.gen.h  SUCCESS Offset: 115,173, Length: 2,212
14:19:34.5245007    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 196,608, EndOfFile: 117,385, FileAttributes: A
14:19:34.5247093    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, AllocationSize: 196,608, EndOfFile: 117,385, FileAttributes: A
14:19:34.5250763    python.exe  52576   CreateFile  G:\godot\main\splash.gen.h  SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
14:19:34.5251513    python.exe  52576   QueryInformationVolume  G:\godot\main\splash.gen.h  BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram᭲
14:19:34.5251670    python.exe  52576   QueryAllInformationFile G:\godot\main\splash.gen.h  BUFFER OVERFLOW CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5251994    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  SUCCESS Offset: 0, Length: 65,536, Priority: Normal
14:19:34.5253904    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  SUCCESS Offset: 65,536, Length: 51,849
14:19:34.5254283    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  END OF FILE Offset: 117,385, Length: 12,288
14:19:34.5256021    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  END OF FILE Offset: 117,385, Length: 65,536
14:19:34.5256326    python.exe  52576   CloseFile   G:\godot\main\splash.gen.h  SUCCESS 
14:19:34.5393288    python.exe  52576   CreateFile  G:\godot\main\splash.gen.h  SUCCESS Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, Write, AllocationSize: n/a, OpenResult: Opened
14:19:34.5394374    python.exe  52576   QueryInformationVolume  G:\godot\main\splash.gen.h  BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ram᮷
14:19:34.5394542    python.exe  52576   QueryAllInformationFile G:\godot\main\splash.gen.h  BUFFER OVERFLOW CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5394938    python.exe  52576   QueryInformationVolume  G:\godot\main\splash.gen.h  BUFFER OVERFLOW VolumeCreationTime: 17/03/2018 11:15:21, VolumeSerialNumber: C6DB-9B59, SupportsObjects: True, VolumeLabel: Ramᮧ
14:19:34.5395311    python.exe  52576   QueryAllInformationFile G:\godot\main\splash.gen.h  BUFFER OVERFLOW CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:33, ChangeTime: 17/03/2018 14:19:33, FileAttributes: A, AllocationSize: 196,608, EndOfFile: 117,385, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x4000000001a6e, EaSize: 0, Access: Generic Read, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Byte
14:19:34.5395981    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  SUCCESS Offset: 0, Length: 114,688, Priority: Normal
14:19:34.5396528    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  SUCCESS Offset: 114,688, Length: 2,697
14:19:34.5396742    python.exe  52576   ReadFile    G:\godot\main\splash.gen.h  END OF FILE Offset: 117,385, Length: 4,096
14:19:34.5397086    python.exe  52576   CloseFile   G:\godot\main\splash.gen.h  SUCCESS 
14:19:35.2264735    cl.exe  3956    QueryDirectory  G:\godot\main\splash.gen.h  SUCCESS Filter: splash.gen.h, 1: splash.gen.h
14:19:35.2265730    cl.exe  3956    CreateFile  G:\godot\main\splash.gen.h  SHARING VIOLATION   Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a
14:19:35.2266291    cl.exe  3956    QueryDirectory  G:\godot\core\math\splash.gen.h NO MORE FILES   Filter: splash.gen.h
14:19:35.2267001    cl.exe  3956    QueryOpen   G:\godot\core\math\splash.gen.h NAME NOT FOUND  
14:19:35.2267691    cl.exe  3956    QueryDirectory  C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h NO MORE FILES   Filter: splash.gen.h
14:19:35.2270577    cl.exe  3956    CreateFile  C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h NAME NOT FOUND  Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
14:19:35.3311607    cl.exe  3956    QueryDirectory  C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h NO MORE FILES   Filter: splash.gen.h
14:19:35.3314719    cl.exe  3956    CreateFile  C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include\splash.gen.h NAME NOT FOUND  Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a
14:19:35.4549549    cmd.exe 1900    CloseFile   G:\godot\main\splash.gen.h  SUCCESS 
14:19:35.4552071    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.4555465    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6042221    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6182531    TGitCache.exe   7324    QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6784076    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A
14:19:35.6795939    python.exe  52576   QueryOpen   G:\godot\main\splash.gen.h  SUCCESS CreationTime: 17/03/2018 14:19:33, LastAccessTime: 17/03/2018 14:19:33, LastWriteTime: 17/03/2018 14:19:35, ChangeTime: 17/03/2018 14:19:35, AllocationSize: 118,784, EndOfFile: 117,385, FileAttributes: A

CL.EXE experiences the SHARING VIOLATION about 700ms later than Python closed the file. Time difference should not even count, only the order of events is important here.

14:19:35.2265730    cl.exe  3956    CreateFile  G:\godot\main\splash.gen.h  SHARING VIOLATION   Desired Access: Generic Read, Disposition: Open, Options: Synchronous IO Non-Alert, Non-Directory File, Attributes: N, ShareMode: Read, AllocationSize: n/a

Why on Earth cmd.exe is keeping the splash.gen.h file open?

Also, there is no CreateFile event for cmd.exe with Path splash.gen.h, so it did not open that file, only closed it. Strange...

14:19:35.4549549    cmd.exe 1900    CloseFile   G:\godot\main\splash.gen.h  SUCCESS 

TortoiseGit's cache seems to find it as well, but that happens only after cmd.exe closes it.

Next step is to figure whether TortoiseGit's cache have anything to do with it.

viktor-ferenczi commented 6 years ago

Turned off both TortoiseHg's and TortoiseGit's icon overlay cache and verified that their processes are not running anymore. Repeated the test and it failed with splash.gen.h again the exact same way. So the icon overlay caches did not cause the issue.

garyo commented 6 years ago

Interesting! How is cmd.exe even involved here... and why is there no CreateFile. interesting questions. SCons will try to remove the old version before building it, but I'm pretty sure that's done directly via os.unlink(). Also note SCons opened (CREATE_FILE) it with ShareMode read and write. A bit unusual, but OK. And also maybe why it worked at all, if cmd.exe had it open already.

If you still have the whole procmon log, maybe seeing how cmd.exe (pid 3956 there) got started would give a hint. Was it started from python/SCons? Must have been, I'd think. SCons will run cmd.exe to execute shell commands. --taskmastertrace=- might show something.

viktor-ferenczi commented 6 years ago

cmd.exe

Before each test I clean the build tree by running scons -c, so all the generated files are removed before the actual build. I've just verified it:

2018-03-17T15:20:35.963000 [51256]: Removed main\splash.gen.h

Also note SCons opened (CREATE_FILE) it with ShareMode read and write. A bit unusual, but OK.

That's caused by some extra code I hacked directly into SCons to explicitly verify that all target files generated can be opened and fully read. This is why python.exe appears to open the file again in ShareMode Read, Write, then read it all with a 64kB chunk size. So the splash.gen.h file was there and fully readable. But not for CL.EXE... Question is, why??

Also, if I run the exact same command line after the build fails, then CL.EXE can successfully read the file. So it might be something with the parent process (SCons's python.exe or some intermediate cmd.exe). It more and more looks like a generic SCons issue on Windows.

garyo commented 6 years ago

But not for CL.EXE... Question is, why??

One would assume it's because of cmd.exe having it open. But that would imply that whatever cmd.exe did to open it failed to show up in the procmon log. And it would imply that cmd.exe didn't have it open when SCons/python was generating it. (Oh, maybe you should run procmon as Adminstrator, if you're not already. Probably not the smoking gun, but best to be sure.)

viktor-ferenczi commented 6 years ago

Here is how cmd.exe gets into the picture:

SCons is apparently invoking CL.EXE using a modified (wrapped) version of os.spawnve to make it thread safe by using a lock. SCons is invoking os.spawnve with os.P_WAIT and passes over the full environment (too long to quote here).

See C:\Python27\scons-3.0.0\SCons\Platform\win32.py, search for: def spawnve(mode, file, args, env):

Pretty printed (space joined) command line of the failing CL.EXE invocation:

C:\WINDOWS\System32\cmd.exe /C "cl /Fomain\main.windows.tools.64.obj /c "main\main.cpp" /TP /nologo /Z7 /Od /EHsc /MT /Gd /GR /w /EHsc -DZSTD_STATIC_LINKING_ONLY -DFT2_BUILD_LIBRARY -DZLIB_DEBUG -DFREETYPE_ENABLED -DFT_CONFIG_OPTION_USE_PNG -DRECAST_ENABLED -DSVG_ENABLED -DGLAD_ENABLED -DGLES_OVER_GL /DDEBUG_MEMORY_ALLOC /DSCI_NAMESPACE /DDEBUG_ENABLED /DDEBUG_MEMORY_ENABLED /DD3D_DEBUG_INFO /DWINDOWS_ENABLED /DOPENGL_ENABLED /DRTAUDIO_ENABLED /DWASAPI_ENABLED /DTYPED_METHOD_BIND /DWIN32 /DMSVC "/D{'_WIN32_WINNT': '0x0601', 'WINVER': '0x0601'}" /D_WIN64 /DPTRCALL_ENABLED /DTOOLS_ENABLED /DGDSCRIPT_ENABLED /DMINIZIP_ENABLED /DXML_ENABLED /Icore /Icore\math /Ieditor /Idrivers /I. /Iplatform\windows "/IC:\Program Files (x86)\Windows Kits\10\Include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\ATLMFC\include" "/IC:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\VC\Tools\MSVC\14.12.25827\include" "/IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\ucrt" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\shared" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\um" "/IC:\Program Files (x86)\Windows Kits\10\Include\10.0.16299.0\winrt" /Ithirdparty\zstd /Ithirdparty\zstd\common /Ithirdparty\zlib /Ithirdparty\rtaudio /Ithirdparty\glad /Ithirdparty\freetype /Ithirdparty\freetype\include /Ithirdparty\libpng /Ithirdparty\recastnavigation\Recast /Ithirdparty\recastnavigation\Recast\Include /Ithirdparty\nanosvg"

This is originally passed as a list, which would be unreadable for us humans. Please note that it makes quoting incorrect above.

Invoking this same command line after the build does succeed, even with wrong quoting.

Since no generated header file is listed on the command line cmd.exe cannot know about it, unless it makes a directory listing (which I don't see on ProcMon).

Apparently cmd.exe keeps its working directory (G:\godot in my case) open and closes every single generated file (both cpp and h) we make.

viktor-ferenczi commented 6 years ago

Summary on all the workarounds attempted:

Changing the execution environment

  1. Disabling Antivirus, TortoiseHg and TortoiseGit icon overlay caches, disabling backup: Build still fails
  2. Invoking the whole build with env var set PYTHONUNBUFFERED=1 set: Build still fails
  3. Invoking the build with -j1 (single process): Build still fails, but makes the problem less likely

Changing SCons source code

  1. Adding gc.collect() before and after the spawnve call: Build still fails
  2. Adding time.sleep(1.0) before each spawnve call: Build completes, but awful slow.

Changing Godot build scripts

  1. Writing the target file with a .new suffix in the generator, closing the file, then renaming it. (Tested only in main/SCsub, since the files generated there frequently cause this build failure.) It resulted in the following error right on renaming:
    2018-03-17T16:54:44.859000 [21436]: scons: *** [main\default_controller_mappings.gen.cpp] G:\godot\main\default_controller_mappings.gen.cpp.new: The process cannot access the file because it is being used by another process

Both os.rename and shutil.move were attempted with the same result.

So files written cannot be renamed right after closing them.

I think this is a step in the right direction to figure out what's going on.

garyo commented 6 years ago

That sounds right, in terms of how SCons invokes subprocesses. But as you say, it's unlikely that cmd.exe is the one opening the generated file. It just gets /c "some command" and executes it. There's one other possibility, which I thought was stamped out years ago but maybe not: inherited file handles. See this python bug: https://bugs.python.org/issue19575. Basically, in the presence of redirection, all open file handles get inherited by the subprocess. So if SCons has a file open, and invokes cmd.exe, maybe cmd.exe inherits that handle. I'm not saying it's definitely that, but it's a possibility.

I wonder why this never happens for me? I have a big AMD CPU and always build with -j30 (it's very nice), and haven't seen this happen. I'm using Win10, SSD, VS2017, python 3.6.3, SCons master branch from January. I'm building in my C drive, you're on G: -- probably not relevant, but who knows...

garyo commented 6 years ago

Here's one thing to try: In SCons Platform/win32.py there is code to turn off file-handle inheritance -- it monkey-patches BufferedReader and others. Search for HANDLE_FLAG_INHERIT. Maybe try to use that logic in main/SCsub when opening the generated files?

viktor-ferenczi commented 6 years ago

Thank you for the ideas. I'm trying to use _scons_open from SCons's Platform/win32.py as you suggested.

If that does not work, then we can try to run all of our Python build (make_*) functions in subprocesses, which would effectively decouple them from the SCons process making inheriting anything from there impossible. Based on former tickets this approach worked in case of shaders.

AMD Ryzen 7 1800X CPU here with 8 physical cores + HT, so the -j16 for me. Could be -j32 due to I/O wait, tough.

viktor-ferenczi commented 6 years ago

HANDLE_FLAG_INHERIT seems to work. UPDATE: It does not actually work. :(

Testing it in a loop many times, then creating a PR.

Higher build parallelism seems to decrease the likelihood of this issue. Reason can be the extra waiting time introduced between subsequent tasks by too much parallelism. Under-utilizing the CPU seems to maximize the likelihood of this issue. Going down to -j1 seems to decrease the probability, tough.

The less time is spent between finishing the Python generator and starting the CL.EXE compiler the more likely this problem is. It happens for sure if the gap is less than a few seconds. There seem to be a delay for closing files, even if we do it properly in Python.

viktor-ferenczi commented 6 years ago

The fix is not reliable, unfortunately. I was just lucky while testing the first 2-3 times, likely because of trying to have -j32 parallelism. Now it is running in a loop with -j16 and the build failed 20 out of 20 times.

Next: Trying to run all generator functions in their own subprocesses.

viktor-ferenczi commented 6 years ago

Running all builders (make_* functions) in subprocesses works well. I've seen this suggestion way above in the chat, actually. Too bad we don't know the exact reason for this problem, but at least there is a reliable workaround now.

Running 20-30 test builds, expect 100% success rate. I will also try with -j1 and -j32. Then a PR will follow.

viktor-ferenczi commented 6 years ago

Build benchmark

Average successful build times in seconds by build parallelism (-j option):

-j1:  794s
-j2:  416s
-j4:  228s
-j8:  166s
-j16: 154s
-j32: 152s

All builds started from clean state and running on a ramdisk. Cleaning was not counted for the build time.

Build machine CPU: Ryzen 7 1800X RAM: 32GB 2400MHz (suboptimal, I know) Motherboard: ASUS ROG Crosshair VI Hero, Socket-AM4 SSD: Samsung 850 Pro 1TB (SATA) OS: Windows 10 Pro 64bit ENU

Griefchief commented 6 years ago

" Running all builders (make_* functions) in subprocesses works well. I've seen this suggestion way above in the chat, actually. Too bad we don't know the exact reason for this problem "

Reading through your work today has given me somewhat of an inspiration today on the reason... I remember using sleep() and fixing issues that way while testing... the 7mb file (the ssl certs one) needs to be slept for about a second or 2 to solve the problem, compared to a really low time for the smaller files... I don't know enough about python&win internals but I think the reason the 7mb file gets released much slower is that it's possibly still being written to the disk by the filesystem/driver, and is therefore not available for consumption by another requester... Python releases it's handle too soon( or too late/never, whichever the reason is), or it simply informs scons too quickly for some reason or other and scons can't then know that the build will fail when another process requests the (bigger file)...

So, anyway, the reason why it was random failing for you is that that "release" time (wherever it is being released) is sometimes 250ms sometimes faster, sometimes slower, depending on the file (size) etc, so if scons is "too fast" via another thread (within the xxx ms) compiler just won't see it when it tries to access it... Sub-process forces a close on the handles when it is quiting itself, or the python (sub)proces simply does not close until it's sure that it has actually released the handle and therefore avoids the problem? I know this isn't an actuall answer, sorry about that, I'm just giving ideas and my exp with this stuff into print.

Also, I just wanted thank you for busting your head on this stuff, I wasn't able to work on Godot internals since the last time I commented above, a year or so ago. With this out of the way I can probably move on other interesting stuff with the vsproj generation, like the paths seemingly not being inputed properly into it, so VS can't find many of the files (the end of the SConstruct file puts those paths in when it generates the projects)... If that wasn't already fixed... Or who knows, maybe even Godot internals.

I just wanted to thank you for "unblocking" me (mentally), in a way, to mess with other Godot's stuff :) I'm glad to see somebody fixed this before me.

On Sun, Mar 18, 2018 at 12:51 AM, Viktor Ferenczi notifications@github.com wrote:

Build benchmark

Average successful build times in seconds by build parallelism (-j option):

-j1: 794s -j2: 416s -j4: 228s -j8: 166s -j16: 154s -j32: 152s

All builds started from clean state and running on a ramdisk. Cleaning was not counted for the build time.

Build machine CPU: Ryzen 7 1800X RAM: 32GB 2400MHz (suboptimal, I know) Motherboard: ASUS ROG Crosshair VI Hero, Socket-AM4 SSD: Samsung 850 Pro 1TB (SATA) OS: Windows 10 Pro 64bit ENU

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/godotengine/godot/issues/5042#issuecomment-373961404, or mute the thread https://github.com/notifications/unsubscribe-auth/AFfbCCoRXtxtuld5tyvlh8S7FCyf3QTMks5tfaGAgaJpZM4IuTZH .

-- Thanks,

Aleksandar Danilovic.

bdbaddog commented 6 years ago

Take a look at this: (Python's implementation of close) https://github.com/python/cpython/blob/f320be77ffb73e3b9e7fc98c37b8df3975d84b40/Modules/_io/fileio.c#L106

Note the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS around the actual C call to close() :

        Py_BEGIN_ALLOW_THREADS
        _Py_BEGIN_SUPPRESS_IPH
        err = close(fd);
        if (err < 0)
            save_errno = errno;
        _Py_END_SUPPRESS_IPH
        Py_END_ALLOW_THREADS

My belief is that the GIL gets released while the file is closing, and randomly the written (or copied) file get's open()'d either by python or a subprocess before the close() call actually completes.. (Note I'm the other SCons Co-Manager along with @garyo )

We've had patches and SCons users swapping to use win32's FileCopy API instead of shutil.copy() with good results. In those cases users were mainly seeing file handle open issue with files pulled in from a CacheDir and/or being installed or Copy()'d by scons.

Again my hypothesis would hold as the FileCopy call via ctypes (or pywin32) would hold the GIL until it returns which shouldn't happen until the file copy has completed.

lawnjelly commented 5 years ago

Following #20373 I'm not sure whether this issue is completely solved, but maybe it is effectively so rare now as to not be a problem. However I have some potentially relevant info.

Following my work adding unity build #13096 option to Godot, I was getting these include errors for the generated files each clean / rebuild cycle. Whether it was the same mechanism for the error I don't know however, I can show how I cured it in the unity build...

Using the SCsub in main as an example, currently my replacement is: old:

env.add_source_files(env.main_sources, "*.cpp")

new:

if env['unity']:
    env.add_source_files(env.main_sources, "#misc/scu/SCU_main.cc")
    env.Depends("#misc/scu/SCU_main.cc", "#main/app_icon.gen.h")
    env.Depends("#misc/scu/SCU_main.cc", "#main/splash.gen.h")
    env.Depends("#misc/scu/SCU_main.cc", "#main/splash_editor.gen.h")
else:
    env.add_source_files(env.main_sources, "*.cpp")

That is, as I understand it, instead of relying on Scons to find the dependency information (and determine the build order), I am explicitly saying that these generated files are needed as a dependency before SCU_main.cc is compiled.

I am of course assuming that we were currently relying on Scons to determine build order without hints such as this. As to why it was occurring more in the unity build, I don't know (perhaps the number of files in the compilation unit were overwhelming the parser Scons uses to auto detect dependencies), or it was limiting the multithreading of the compilation.

This leads me to suspect that the same problem may be existing in the main build (Scons not knowing order of dependencies) but it is being masked from showing up in most circumstances, and only showing up 'randomly' on some people's machines. If this is the case then the same steps to fix the unity build might fix the main build.

Other dependencies needed were: In editor:

    if env['unity']:
        env.add_source_files(env.editor_sources, "#misc/scu/SCU_editor.cc")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/authors.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/donors.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/license.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "#core/io/certs_compressed.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "builtin_fonts.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "doc_data_compressed.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "translations.gen.h")
        env.Depends("#misc/scu/SCU_editor.cc", "editor_icons.gen.h")
    else:
        env.add_source_files(env.editor_sources, "*.cpp")

In modules/gdnative:

if env['unity']:
    env_gdnative.add_source_files(env.modules_sources, "#misc/scu/SCU_modules_gdnative.cc")
    env.Depends("#misc/scu/SCU_modules_gdnative.cc", "gdnative_api_struct.gen.cpp")
    env.Depends("#misc/scu/SCU_modules_gdnative.cc", "gdnative_api_struct.gen.h")
else:
    env_gdnative.add_source_files(env.modules_sources, "gdnative.cpp")
    env_gdnative.add_source_files(env.modules_sources, "register_types.cpp")
    env_gdnative.add_source_files(env.modules_sources, "android/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "nativescript/*.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative_library_singleton_editor.cpp")
    env_gdnative.add_source_files(env.modules_sources, "gdnative_library_editor_plugin.cpp")

I think those were the main ones, I may have missed a couple. To modify this for the main build you might need to change the Depends sections so that the targets are cpp in the main build, or some approach like this.

Does this sound a plausible alternative explanation (in addition to the file locking thing)? If so I can have a think about how to apply the same dependencies to the main build. Has anyone still been having this problem since #17595 ?

bdbaddog commented 5 years ago

@lawnjelly - This shouldn't be necessary unless the names of the header files are defined via a preprocessor value like this:

#define MY_INCLUDE "abc.h"
#include MY_INCLUDE

My first guess would one of the following conditions apply:

  1. The builder creating those header files (assuming they are generated) doesn't properly inform scons that those files are being generated
  2. The CPPPATH doesn't include the path where those files are found
  3. Perhaps a #include "a/b/c/d.h"

In general if you have to add a Depends() for a header file something is wrong. This is not expected when using SCons.

BTW. You don't specify order to SCons, you specify dependencies and SCons determines order using that.

lawnjelly commented 5 years ago

@lawnjelly - This shouldn't be necessary unless the names of the header files are defined via a preprocessor value like this:

#define MY_INCLUDE "abc.h"
#include MY_INCLUDE

Ah, then ignore me, this may be what is causing the issue on the unity build. I am doing exactly this (for easy changes, the path is currently specified in a macro). However it might potentially make things easier if I can remove the Depends() by specifying non-macrod paths, I will try and test this. :+1: