Test suite failures on Windows GitHub runner

mwichmann commented 1 month ago

PR #4538 proposed adding a Windows runner on GitHub since we've lost, without reasonable explanation, the AppVeyor builds (supposedly due to the test failing too often, but it fails due to AppVeyor's own failure to properly update a test image - Visual Studio "license expired" problem).

Some surprises in trying to move the test run over... a bunch of unexpected failures, some rather mysterious. In the initial merge, a skip list has been added, but we'd like to get rid of that as much as possible.

test/CPPDEFINES/pkg-config.py
test/Interactive/added-include.py
test/Interactive/Alias.py
test/Interactive/basic.py
test/Interactive/cache-debug.py
test/Interactive/cache-disable.py
test/Interactive/cache-force.py
test/Interactive/cache-show.py
test/Interactive/clean.py
test/Interactive/configure.py
test/Interactive/Default-None.py
test/Interactive/Default.py
test/Interactive/exit.py
test/Interactive/failure.py
test/Interactive/help.py
test/Interactive/implicit-VariantDir.py
test/Interactive/option--Q.py
test/Interactive/option-i.py
test/Interactive/option-j.py
test/Interactive/option-k.py
test/Interactive/option-n.py
test/Interactive/option-s.py
test/Interactive/repeat-line.py
test/Interactive/shell.py
test/Interactive/tree.py
test/Interactive/unknown-command.py
test/Interactive/variant_dir.py
test/MSVC/msvc.py
test/packaging/msi/explicit-target.py
test/packaging/msi/file-placement.py
test/packaging/msi/package.py
test/packaging/tar/xz_packaging.py
test/scons-time/run/config/python.py
test/scons-time/run/option/python.py
test/scons-time/run/option/quiet.py
test/scons-time/run/option/verbose.py
test/sconsign/script/no-SConsignFile.py
test/sconsign/script/SConsignFile.py
test/sconsign/script/Signatures.py

mwichmann commented 1 month ago

Copying over some comments from the PR:

The sconsign tests fail because the third field of entries for files is -1, which doesn't match the regex expressions \d+. But why is it getting -1?

The scons-time test - the python.py ones fail by missing the "command echo" from the files, the other two fail because we get back a path that's gone through 8+3 shortening, like:

< 'SConstruct file directory: C:\\\\Users\\\\runneradmin\\\\AppData\\\\Local\\\\Temp\\\\scons\\-time\\-[^\\\\]*\\\\foo'
---
> 'SConstruct file directory: C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\scons-time-pim4q763\\foo'

mwichmann commented 1 month ago

More comments:

Some packaging tests fail with:

scons: *** Missing Packagetag 'X_MSI_LANGUAGE' for SCons.Tool.packaging.msi packager

... even though the packaging tool itself sets that.

The tar-xz test fails because the identified tar doesn't know about xz format. Possibly another case of too generously detecting something.

mwichmann commented 1 month ago

The pkg-config test fails because perl is not found. The runner has a piece of software called Strawberry Perl installed, the initial test check finds it, and that bundle has a pkg-config.bat file that ends up calling Perl, but the path where things were found is not something SCons itself knows about, thus the error:

STDERR =========================================================================
'perl' is not recognized as an internal or external command,
operable program or batch file.
OSError: ' "C:/Strawberry/perl/bin/pkg-config.BAT" --cflags bug.pc' exited 9009:

mwichmann commented 1 month ago

The interactive tests almost all (26/28) fail waiting for a file to appear - it's always the result of calling the Touch action function. Example:

544/1278 (42.57%) C:\hostedtoolcache\windows\Python\3.12.3\x64\python.exe test\Interactive\Alias.py
STDOUT =========================================================================
scons>>> Copy("foo.out", "foo.in")
scons>>> Touch("1")
scons>>> 

timed out waiting for C:\Users\RUNNER~1\AppData\Local\Temp\scons\testcmd.4068.iy320bhc\1 to exist

This may be because the path has been munged by a DOS-style 8.3 transformation (there's another set of issues of this type)... the test runner username is actually runneradmin and the resulting home directory becomes something different.

mwichmann commented 1 month ago

timed out waiting for C:\Users\RUNNER~1\AppData\Local\Temp\scons\testcmd.4068.iy320bhc\1 to exist

This may be because the path has been munged by a DOS-style 8.3 transformation (there's another set of issues of this type)... the test runner username is actually runneradmin and the resulting home directory becomes something different.

Looks like the transformation is done by Windows itself. I created a throwaway account with the same name as the problem one, runneradmin. With that, not even getting Python involved, we see this in the environment settings:

TEMP=C:\Users\RUNNER~1\AppData\Local\Temp
TMP=C:\Users\RUNNER~1\AppData\Local\Temp

And in Python, tempfile.gettempdir() shows the same.

However, knowing that's where it comes from doesn't leave me any wiser as to the reason for things failing.

mwichmann commented 1 month ago

I've actually found one real bug. Most of the tests are not failing in my VM using the same user ID. But in the test framework we have this function (abridged a bit for brevity):

def tempdir_re(self, *args):
    """Returns a regular expression to match a scons-time
    temporary directory."""
    tempdir = tempfile.gettempdir()
    try:
        realpath = os.path.realpath
    except AttributeError:
        pass
    else:
        tempdir = realpath(tempdir)

The problem with this is, if the tempdir has a Windows-squashed path, realpath is going to un-squash it, and anybody that builds expectations for paths based on that unsquashing is going to have path mismatches with what Windows is actually using. Only two scons-time tests actually use this.

mwichmann commented 1 month ago

The packaging\msi tests fail if candle from the WiX Toolset is installed, so presumably those are buggy tests.

mwichmann commented 1 month ago

For informational purposes, this runner is described in .github/workflows/runtest-win.yml. It currently excludes the tests found to be failing by using an exclude file, since we had problems with a test run constantly failing on another CI host and them cutting us off (even though the problem was of their making).

Two of the scons-time failures have now been eliminated (and taken out of the excludelist file) via merged #4542.

SCons / scons

Test suite failures on Windows GitHub runner #4539