actions / runner-images

GitHub Actions runner images
MIT License
9.76k stars 2.99k forks source link

New unexpected build failures on 20240603.1.0 #10004

Closed llvm-beanz closed 2 months ago

llvm-beanz commented 2 months ago

Description

Our PR builds had been working fine as expected until last night when the runners updated to the 20240603.1.0 image. An impacted PR is here:

https://github.com/microsoft/DirectXShaderCompiler/pull/6668

Earlier iterations of the PR build successfully, but the builds began failing once the image updated.

See a failing build here: https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6383&view=results

And a previously successful one here: https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6371&view=results

Platforms affected

Runner images affected

Image version and build link

Image: 20240603.1.0 https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6383&view=results

Is it regression?

20240514.3.0

Expected behavior

Our builds should work? https://dev.azure.com/DirectXShaderCompiler/public/_build/results?buildId=6371&view=results

Actual behavior

The build fails with errors we don't encounter locally or on older VM images.

Repro steps

We cannot reproduce outside the VM image.

past-due commented 2 months ago

We are also seeing new issues in the 20240603.1.0 Windows Server 2022 image, on GitHub Actions - Standard Runners.

Specifically, in our case, a built executable yields an access violation when attempts are made to run it as part of later stages of our build process.

Esvandiary commented 2 months ago

We're also seeing issues with windows-2022 runners starting sometime in the last two days; re-running previously successful jobs consistently result in failures. We're building a C++ project using CMake and VS2022.

In our case, the build process itself succeeds but trying to run any of the resulting executables later in the workflow fails with Access violation. Local builds of the same code do not exhibit any issues.

ScottTodd commented 2 months ago

We're also seeing similar issues on windows-2022 with

Errors are all segfaults trying to run executable files.

past-due commented 2 months ago

The issue may be due to a combination of the ordering of the PATH environment variable, older versions of vcruntime in the path (ex. bundled with Python), and changes in VS 2022 17.10 STL that require the latest vcruntime.

When running the following on windows-2022:20240603.1.0 I see: echo ((Get-Command vcruntime140.dll).Path) = C:\hostedtoolcache\windows\Python\3.9.13\x64\vcruntime140.dll echo ((Get-Command vcruntime140.dll).Version) = 14.29.30139.0

But the system-installed version (at C:\Windows\system32\vcruntime140.dll) is 14.40.33810.00

As noted in the VS 2022 17.10 STL release notes:

Reference: https://github.com/microsoft/STL/releases/tag/vs-2022-17.10 Another reference: https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660?sort=active

And, in fact, I can confirm that built executables run correctly on a VM where I've ensured that vcruntime140.dll version 14.40.33810.00 is what's loaded.


Possible recommendations for windows images:

firthm01 commented 2 months ago

Also seeing segfaults running executables from a windows-latest (20240603.1.0) runner. Was fine on 20240514.3.0. Private repo so can't share runs, but it looks very similar to the other reports. In our case we're running pluginval.exe against some audio plugins built by the runner. Interestingly not an immediate segfault. Runs through several test phases before failing.


Starting tests in: pluginval / Editor...
D:\a\_temp\7e7e13d1-ac8c-4be7-bda4-e292e2e5bb48.sh: line 3:  1317 Segmentation fault      ./pluginval.exe --strictness-level 10 --verbose --validate "build/plugin/[REDACTED].vst3"
##[error]Process completed with exit code 139.```
Esvandiary commented 2 months ago

DLL confusion due to PATH issues makes a lot of sense - I had a build succeed just now simply by changing the workflow to build in Debug rather than Release. The executables would then be looking for d-suffixed DLLs, which wouldn't conflict.

rouault commented 2 months ago

I also see a regression on the GDAL (https://github.com/OSGeo/gdal) CI related to that change, causing crashes during tests execution:

mkruskal-google commented 2 months ago

We hit this too and traced it back to std::mutex usage in https://github.com/abseil/abseil-cpp. This looks like https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10668856, which suggests an older incompatible version of msvcp140.dll is being used in this image.

We also only see this in optimized builds. Our debug-built executables still work.

The stacktrace we got:

*** SIGSEGV received at time=1717698962 ***
[1920](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1946)
@ 00007FF638B1F1D2 (unknown) `__scrt_common_main_seh'::`1'::filt$0
[1921](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1947)
@ 00007FF9C733EFF0 (unknown) _C_specific_handler
[1922](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1948)
@ 00007FF9D5A843BF (unknown) _chkstk
[1923](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1949)
@ 00007FF9D5A1186E (unknown) RtlVirtualUnwind2
[1924](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1950)
@ 00007FF9D5A833AE (unknown) KiUserExceptionDispatcher
[1925](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1951)
@ 00007FF9C72B3278 (unknown) Thrd_yield
[1926](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1952)
@ 00007FF638B13566 (unknown) absl::lts_20240116::time_internal::cctz::time_zone::Impl::LoadTimeZone
[1927](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1953)
@ 00007FF638B1245F (unknown) absl::lts_20240116::time_internal::cctz::local_time_zone
[1928](https://github.com/protocolbuffers/protobuf/actions/runs/9406060503/job/25908647666?pr=17050#step:3:1954)
@ 00007FF638B1184E (unknown) absl::lts_20240116::InitializeLog
henryruhs commented 2 months ago

My Python code using ctypes.windll.kernel32.GetShortPathName to shorten paths stopped working since this runner Image: windows-2022 and Version: 20240603.1.0 update.

🟢 20240514.3.0: https://github.com/facefusion/facefusion/actions/runs/9401500581/job/25893418056 🔴 20240603.1: https://github.com/facefusion/facefusion/actions/runs/9407074170/job/25912005903

bduffany commented 2 months ago

20240603.1.0 broke us too; we're consistently seeing nondescript "failed to execute command" errors in some protoc.exe executions (google protobuf compiler).

rouault commented 2 months ago

I've attempted that in https://github.com/rouault/gdal/actions/runs/9407632196/job/25913839874, copying c:\Windows\system32\vcruntime140.dll in a specific directory and putting it in front of the path, but it appears that the version of c:\Windows\system32\ is only 14.32.31326 , and not >= 14.40.33810.00. I would argue that the runner-image should be fixed to have a recent enough vcruntime140.dll in front of the PATH.

A workaround I found when reading https://developercommunity.visualstudio.com/t/Access-violation-in-_Thrd_yield-after-up/10664660#T-N10668856 is that you can define "/D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR" when building your software to revert to a std::mutex constructor compatible of older vcruntimes : https://github.com/rouault/gdal/commit/c4ab31f0eb1c196dff2d561292a29d8840223790 . My builds work fine with that workaround.

ScottTodd commented 2 months ago

Can we expect a rollback or some other fix to the runner images, or will affected projects need to use one of the listed workarounds? Is there a way to use older/stable runner images instead of this new version?

randombit commented 2 months ago

Seeing this also

Runner image 20240514.3.0 works fine

https://github.com/randombit/botan/actions/runs/9408188903/job/25918967600

Runner image 20240603.1.0, built binaries fail with error code 3221225477

https://github.com/randombit/botan/actions/runs/9408188903/job/25918967856

Same code in this case; one (working) is the PR the second (failing) is the merge of that PR into master.

YOU54F commented 2 months ago

Getting similar errors today with the new images, simply executing curl to download a binary. affecting windows-2019 and windows-2022 images. ( I was using windows-latest, but tried windows-2019, same result)

🔵  Downloading ffi v0.4.24 for pact_ffi-windows-x86_64.dll.gz
Error: Process completed with exit code 43.

Here is a re-run of job today, using a newer runner, that passed yesterday.

🟢 20240514.3.0 - https://github.com/YOU54F/pact-js-core/actions/runs/9392917514/job/25868610705#step:1:9 🔴 20240603.1.0 - https://github.com/YOU54F/pact-js-core/actions/runs/9392917514/job/25920895001#step:1:9

curl --output foo --write-out "%{http_code}" --location https://github.com/pact-foundation/pact-ruby-standalone/releases/download/v2.4.4/pact-2.4.4-windows-x86_64.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
curl: libcurl function was given a bad argument
000

I believe my error is https://github.com/curl/curl/issues/13845 probably due to the image update, updating curl deps

ijunaidm commented 2 months ago

@llvm-beanz - Kindly re-run the pipeline now. And also if you face same error again, please share the yaml script/snippet for Building task from the pipeline.

kubo990 commented 2 months ago

I also see failed pipelines in new build [20240603.1.0]. Errors in solution restore task: C:\Program Files\dotnet\sdk\8.0.300\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(154,5): error NETSDK1145: The Apphost pack is not installed and NuGet package restore is not supported. Upgrade Visual Studio, remove global.json if it specifies a certain SDK version, and uninstall the newer SDK. For more options visit https://aka.ms/targeting-apphost-pack-missing Pack Type:Apphost, Pack directory: C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Host.win-x86, targetframework: net7.0, Pack PackageId: Microsoft.NETCore.App.Host.win-x86, Pack Package Version: 7.0.19 [D:\***.vcxproj] Is it related to some issues on host machine or some changes should be done on our end? Projects use .NET 7, but host machine seems uses .NET SDK 8.

kishorekumar-anchala commented 2 months ago

I also see failed pipelines in new build [20240603.1.0]. Errors in solution restore task: C:\Program Files\dotnet\sdk\8.0.300\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.FrameworkReferenceResolution.targets(154,5): error NETSDK1145: The Apphost pack is not installed and NuGet package restore is not supported. Upgrade Visual Studio, remove global.json if it specifies a certain SDK version, and uninstall the newer SDK. For more options visit https://aka.ms/targeting-apphost-pack-missing Pack Type:Apphost, Pack directory: C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Host.win-x86, targetframework: net7.0, Pack PackageId: Microsoft.NETCore.App.Host.win-x86, Pack Package Version: 7.0.19 [D:\***.vcxproj] Is it related to some issues on host machine or some changes should be done on our end? Projects use .NET 7, but host machine seems uses .NET SDK 8.

@kubo990 , It would be helpful if you share the vsbuild stage yaml script . also confirm in which SDK your application developed.

Arech commented 2 months ago

This has broken all Windows-based Azure pipelines in our Company and everyone is halted. I wonder how much would it cost for us...

henryruhs commented 2 months ago

I think windows-next would be a good idea in the future... Just roll it back guys!

turboderp commented 2 months ago

I can add that this also breaks CUDA<12.5 builds.

C:\Miniconda3\envs\build\include\crt/host_config.h(153): fatal error C1189: #error:  -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

I've been trying a bunch of things, and I'm most optimistic about explicitly downgrading VS as a build step:

      - run: choco upgrade visualstudio2022enterprise --version=117.9.7.0
        if: runner.os == 'Windows'

It still uses the more recent version, though, so I guess there's a path issue to resolve, or maybe I also need to uninstall VS2022-Enterprise from the runner, but at least I don't seem to be able to do that via choco.

kubo990 commented 2 months ago

@kubo990 , It would be helpful if you share the vsbuild stage yaml script . also confirm in which SDK your application developed.

@kishorekumar-anchala Application targets .NET 7 SDK. Workaround was adding UseDotNet task with specifing/installing sdk 7.0, however I don't think that it should be needed, according to doc, SDK 7.0 should be still installed on windows-2022 agent.

- task: UseDotNet@2
  displayName: 'Use .NET Core sdk 7.x'
  inputs:
    version: 7.x

there are no special commands in yaml pipeline. it fails on nuget restore task.

name: $(Build.BuildId)

pool:
  vmImage: 'windows-2022'

trigger:
  - 'develop'

pr:
  - 'develop'

steps:
***(some powershell not related tasks)

- task: NuGetCommand@2
  displayName: 'Restore NuGet Packages'
  inputs:
    restoreSolution: 'SolutionName.sln'

*** build tasks

`

saran-t commented 2 months ago

There's a copy of vcruntime140.dll version 14.40.33810 in C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Redist\MSVC\14.40.33807\x64\Microsoft.VC143.CRT.

Adding this to GITHUB_PATH doesn't seem to help in my case, my tests are still failing with unknown file: error: SEH exception with code 0xc0000005 thrown in the test body., but perhaps this might be useful to someone else.

jviotti commented 2 months ago

Is there a workaround other than doing debug builds? This is heavily impacting pipelines of many of our projects and we definitely need Windows doing builds and packages on CI

mwestphal commented 2 months ago

Also affected over at https://github.com/f3d-app/f3d/

saran-t commented 2 months ago

OK, copying all DLLs from C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Redist\MSVC\14.40.33807\x64\Microsoft.VC143.CRT to the same directory as the test binaries ~fixed~ sidestepped the issue for me.

I added the following stage between building and testing, bin\Release is where all the test binaries live.

Copy-Item (Join-Path `
    ((Get-ChildItem -Directory `
        -Path "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Redist\MSVC\14.*" |
      Sort -Descending | Select-Object -First 1).FullName
    ) 'x64\Microsoft.VC143.CRT\*.dll') "bin\Release"
dneto0 commented 2 months ago

Aside from needing to fix the image, I guess this also means there should be a smoke test for any new image roll out.

dneto0 commented 2 months ago

@llvm-beanz - Kindly re-run the pipeline now. And also if you face same error again, please share the yaml script/snippet for Building task from the pipeline.

I was the original reporter to @llvm-beanz

A colleague ran a similar pipeline starting about 20 minutes ago, for https://github.com/microsoft/DirectXShaderCompiler/pull/6679 The Azure pipepline for a VS2022 release build is: https://github.com/microsoft/DirectXShaderCompiler/pull/6679/checks?check_run_id=25949596545 It failed after 6 minutes, with raw log https://dev.azure.com/DirectXShaderCompiler/5060fecd-942c-4880-bb20-b8b29e0cbfe1/_apis/build/builds/6389/logs/50

I believe the Azure pipeline is defined in https://github.com/microsoft/DirectXShaderCompiler/blob/main/azure-pipelines.yml

dneto0 commented 2 months ago

Looks like it affects another project.

This PR: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/pull/2598 This job: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/actions/runs/9419555155/job/25949562007?pr=2598 This workflow: https://github.com/KhronosGroup/SPIRV-LLVM-Translator/blob/7d6d66975ff394a18e60db2e83aa872b65180248/.github/workflows/check-in-tree-build.yml#L110

It's doing a release build with VS2022, and the symptom is:

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(254,5): error MSB8066: Custom build for 'D:\a\SPIRV-LLVM-Translator\SPIRV-LLVM-Translator\build\CMakeFiles\8bc6b7dc048b6551197d3c360a7ac54d\ARMTargetParserDef.inc.rule;D:\a\SPIRV-LLVM-Translator\SPIRV-LLVM-Translator\build\CMakeFiles\edf72b9a29a2179a19058e3c565d1071\ARMTargetParserTableGen.rule;D:\a\SPIRV-LLVM-Translator\SPIRV-LLVM-Translator\llvm-project\llvm\include\llvm\TargetParser\CMakeLists.txt' exited with code -1073741819. [D:\a\SPIRV-LLVM-Translator\SPIRV-LLVM-Translator\build\include\llvm\TargetParser\ARMTargetParserTableGen.vcxproj]

Basically, error MSB8066: Custom build ... exited with code -1073741819. I've seen that exit code across many affected jobs.

eXpl0it3r commented 2 months ago

We've run into this issue as well with SFML (example: https://github.com/SFML/SFML/actions/runs/9408929854 ).

Combined with #10001 that has a similar source of error, it's rather disappointing that GitHub being home to hundreds of open source projects, nobody had considered picking a few and running the new image against their workflows before publishing it. Instead we have thousands of customers, both free and commercial, affected.
(For anyone working on these images, I can recommend SFML's CI workflow as a C++ testbed, given that we build for a ton of different configurations - totally not biased. 😉)

Currently trying to figure out if we can someone force PATH to point to the correct runtime.

Basically, error MSB8066: Custom build ... exited with code -1073741819. I've seen that exit code across many affected jobs.

That's just the standard "Access Violation" error code, more commonly seen in hexadecimal format: 0xC0000005.

guusw commented 2 months ago

Same thing, had to update our LLVM toolchain to 17.0+ because of updated SDK headers, now every application built with that toolchain fails to run with a segfault: https://github.com/fragcolor-xyz/shards/actions/runs/9416413387/job/25940571766

I downloaded the built binary and ran the application on my owned system and it ran without problems.

Seems indeed like a C/C++ runtime dll issue.

mmomtchev commented 2 months ago

I was able to run it in with MSVC ASAN enabled: https://github.com/mmomtchev/magickwand.js/actions/runs/9422296093/job/25958255145

The faulty DLL is C:\Windows\SYSTEM32\MSVCP140.dll

I don't have the problem with debug builds.

PS. My self-hosted build uses the exact same MSVC version (expect that it is the community edition)

sudara commented 2 months ago

I'm seeing this issue across many repositories.

I can confirm that passing this flag to cmake is a functioning workaround: -DCMAKE_CXX_FLAGS=" /D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR"

bdashore3 commented 2 months ago

I've had this issue as well. If rolling back is an option, here's how I did it:

Install the last version of VS build tools 17.9 with this block in your YAML:

- name: Install VS2022 BuildTools 17.9.7
  run: choco install -y visualstudio2022buildtools --version=117.9.7.0 --params "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64 --installChannelUri https://aka.ms/vs/17/release/180911598_-255012421/channel"
  if: runner.os == 'Windows'

The --installChannelUri param is used due to this issue: https://github.com/jberezanski/ChocolateyPackages/issues/100

When building, activate the VS dev shell in the same run step (I'm not sure how to persist the dev shell activation across run steps. Copying the PATH didn't seem to help)

# --- Spawn the VS shell
if ($IsWindows) {
  Import-Module 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\Common7\Tools\Microsoft.VisualStudio.DevShell.dll'
  Enter-VsDevShell -VsInstallPath 'C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools' -DevCmdArguments '-arch=x64 -host_arch=x64'

  # Used for python building
  $env:DISTUTILS_USE_SDK=1
}

A full example is located on my ExllamaV2 fork which had problems building on VS build tools v17.10.

MarkCallow commented 2 months ago

Our builds are affected too with many in-flight PR's stalled. In our case the exceptions are all 0xC0000005 which occurs when there is a problem loading a dependent DLL which chimes with the DLL version issues described earlier.

Needs an urgent fix. Please scrap this update, roll back to 20240514.3.0 wait until you have a proper fix before updating again.

odygrd commented 2 months ago

Also having the same issue

adding this line fixed it for my project

target_compile_definitions(${target_name} _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR)

https://github.com/odygrd/quill/commit/9cd69094c513e536b5ad1e7a0fbc92174a1a9cf7

llvm-beanz commented 2 months ago

@ijunaidm, the traffic on this issue suggests this is a widespread problem. Can you please roll back the image to the older version while you sort this out so that further disruption can be avoided?

Faulty CI for an extended period of time results in huge and expensive productivity disruptions.

llvm-beanz commented 2 months ago

@ashtom we’re two days into a significant outage caused by an image update to the GitHub Actions and Azure Pipelines Windows Images.

Sedeniono commented 2 months ago

Well, I also ran into the mutex problem with release builds. From a crash dump of my executable I can clearly see that it is using the VS 17.10 STL headers (_MSVC_STL_UPDATE==202402) and that _DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR is not defined. As a result, the constexpr constructor of std::mutex is called, which initializes _Mtx_storage._Critical_section with zeros. The non-constexpr constructor would call _Mtx_init_in_situ() which is not just zero-initializing things. The gotcha: When e.g. std::mutex::lock() is called, the code lives in the msvcp140.dll. In the crash dump I can clearly see that it is using "C:\Windows\System32\msvcp140.dll" which is at version 14.32.313260.0. This corresponds to the STL shipped with VS 17.2. And that dll basically assumes that the mutex internals are "properly" initialized with _Mtx_init_in_situ(). But they aren't.

As mentioned @past-due here, VS 17.10 STL breaks the compatiblity (specifically in https://github.com/microsoft/STL/pull/4339).

Can't the Windows runner simply also have the latest redistributable installed? That should fix the issue.

MarkCallow commented 2 months ago

GitHub staff, please conduct a post-mortem into how this breakage happened and take steps, e.g. improved testing, to ensure such breakages do not happen again.

mmomtchev commented 2 months ago

GitHub staff, please conduct a post-mortem into how this breakage happened and take steps, e.g. improved testing, to ensure such breakages do not happen again.

Although I agree that the Github team should probably compile some non-trivial software with their runner images before deploying, the underlying issue is the Microsoft one. DLLs are versioned so that this does not happen. This can happen to your users too - if they don't have the correct DLL, they will have a crash.

This means that the only to ensure that those binaries work is to include this runtime in the package. I might as well link it statically in this case, why should I use a shared library at all. Microsoft, what have you done.

lelegard commented 2 months ago

+1 as many, previously reported as #10020, before I was redirected to this one.

The justifications from Microsoft about the proposed workaround are really pathetic. They created a huge mess on the build + test + deployment and when many users complain about a crash, they close it as "by design". By design !!! And they propose no reliable way to always select the "right" VC redistributable so that it works.

Sorry about that, GitHub team, it is probably not your fault. But all users need a reliable CI pipeline for C++ applications to build and test. And, as of today, it is simply broken. Fully broken.

I am not the only one to ask for this here: Please revert to the previous runner image until you find a way to build a reliable runner image for C++ applications with the new versions.

lelegard commented 2 months ago

@Sedeniono

Can't the Windows runner simply also have the latest redistributable installed? That should fix the issue.

I thought it would, but it seems more complicated than that.

In the workflow, before building the C++ app, I search the "VC redistributable package" which comes with the VS I use for the build and installed it before building. So the scenario of the workflow is

  1. Locate MSBuild.exe and vc_redist.x64.exe from the current VS install. In that case, they are both found under C:\Program Files\Microsoft Visual Studio\2022\Community
  2. Install vc_redist.x64.exe
  3. Build the C++ application using MSBuild.exe
  4. Run the application => still fails the same way

See a demonstrator here: https://github.com/lelegard/gh-runner-lock-2

So, the problem is worse than that. Even if you install the same VC runtime as used for the compilation, it does not work either.

lelegard commented 2 months ago

Currently, inside a GH runner, the list of Path elements with MSVC DLL's and their version information is:

C:\Program Files\PowerShell\7
    C:\Program Files\PowerShell\7\vcruntime140_cor3.dll => 14.38.33130.0
C:\Program Files\MongoDB\Server\5.0\bin
C:\aliyun-cli
C:\vcpkg
C:\Program Files (x86)\NSIS\
C:\tools\zstd
C:\Program Files\Mercurial\
    C:\Program Files\Mercurial\msvcm90.dll => 9.00.30729.9518
    C:\Program Files\Mercurial\msvcp90.dll => 9.00.30729.9518
    C:\Program Files\Mercurial\msvcr90.dll => 9.00.30729.9518
C:\hostedtoolcache\windows\stack\2.15.7\x64
C:\cabal\bin
    ==> non existent
C:\\ghcup\bin
C:\mingw64\bin
C:\Program Files\dotnet
C:\Program Files\MySQL\MySQL Server 8.0\bin
C:\Program Files\R\R-4.4.0\bin\x64
C:\SeleniumWebDrivers\GeckoDriver
C:\SeleniumWebDrivers\EdgeDriver\
C:\SeleniumWebDrivers\ChromeDriver
C:\Program Files (x86)\sbt\bin
C:\Program Files (x86)\GitHub CLI
    ==> non existent
C:\Program Files\Git\bin
C:\Program Files (x86)\pipx_bin
C:\npm\prefix
C:\hostedtoolcache\windows\go\1.21.10\x64\bin
C:\hostedtoolcache\windows\Python\3.9.13\x64\Scripts
C:\hostedtoolcache\windows\Python\3.9.13\x64
    C:\hostedtoolcache\windows\Python\3.9.13\x64\vcruntime140_1.dll => 14.29.30139.0 built by: vcwrkspc
    C:\hostedtoolcache\windows\Python\3.9.13\x64\vcruntime140.dll => 14.29.30139.0 built by: vcwrkspc
C:\hostedtoolcache\windows\Ruby\3.0.7\x64\bin
C:\Program Files\OpenSSL\bin
C:\tools\kotlinc\bin
C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\8.0.412-8\x64\bin
    C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\8.0.412-8\x64\bin\msvcp140.dll => 14.16.27033.0 built by: vcwrkspc
    C:\hostedtoolcache\windows\Java_Temurin-Hotspot_jdk\8.0.412-8\x64\bin\vcruntime140.dll => 14.16.27033.0 built by: vcwrkspc
C:\Program Files\ImageMagick-7.1.1-Q16-HDRI
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\msvcp140_1.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\msvcp140_2.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\msvcp140_atomic_wait.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\msvcp140_codecvt_ids.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\msvcp140.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\vcruntime140_1.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\vcruntime140_threads.dll => 14.38.33135.0
    C:\Program Files\ImageMagick-7.1.1-Q16-HDRI\vcruntime140.dll => 14.38.33135.0
C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin
C:\ProgramData\kind
C:\ProgramData\docker-compose
C:\ProgramData\Chocolatey\bin
C:\Windows\system32
    C:\Windows\system32\msvcirt.dll => 7.0.20348.1 (WinBuild.160101.0800)
    C:\Windows\system32\msvcp_win.dll => 10.0.20348.1 (WinBuild.160101.0800)
    C:\Windows\system32\msvcp110_win.dll => 10.0.20348.1 (WinBuild.160101.0800)
    C:\Windows\system32\msvcp120_clr0400.dll => 12.00.52519.0 built by: VSWINSERVICING
    C:\Windows\system32\msvcp120.dll => 12.00.40660.0 built by: VSULDR
    C:\Windows\system32\msvcp140_1.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140_1d.dll => 14.40.33810.0
    C:\Windows\system32\msvcp140_2.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140_2d.dll => 14.40.33810.0
    C:\Windows\system32\msvcp140_atomic_wait.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140_clr0400.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140_codecvt_ids.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140.dll => 14.32.31326.0
    C:\Windows\system32\msvcp140d_atomic_wait.dll => 14.40.33810.0
    C:\Windows\system32\msvcp140d_codecvt_ids.dll => 14.40.33810.0
    C:\Windows\system32\msvcp140d.dll => 14.40.33810.0
    C:\Windows\system32\msvcp60.dll => 7.0.20348.1 (WinBuild.160101.0800)
    C:\Windows\system32\msvcr100_clr0400.dll => 14.8.9037.0 built by: NET481REL1
    C:\Windows\system32\msvcr120_clr0400.dll => 12.00.52519.0 built by: VSWINSERVICING
    C:\Windows\system32\msvcr120.dll => 12.00.40660.0 built by: VSULDR
    C:\Windows\system32\msvcrt.dll => 7.0.20348.1 (WinBuild.160101.0800)
    C:\Windows\system32\vcruntime140_1_clr0400.dll => 14.32.31326.0
    C:\Windows\system32\vcruntime140_1.dll => 14.32.31326.0
    C:\Windows\system32\vcruntime140_1d.dll => 14.40.33810.0
    C:\Windows\system32\vcruntime140_clr0400.dll => 14.32.31326.0
    C:\Windows\system32\vcruntime140_threads.dll => 14.40.33810.0
    C:\Windows\system32\vcruntime140_threadsd.dll => 14.40.33810.0
    C:\Windows\system32\vcruntime140.dll => 14.32.31326.0
    C:\Windows\system32\vcruntime140d.dll => 14.40.33810.0
C:\Windows
C:\Windows\System32\Wbem
C:\Windows\System32\WindowsPowerShell\v1.0\
C:\Windows\System32\OpenSSH\
C:\Program Files\dotnet\
C:\Program Files\PowerShell\7\
    C:\Program Files\PowerShell\7\vcruntime140_cor3.dll => 14.38.33130.0
C:\Program Files\Microsoft\Web Platform Installer\
C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\
C:\Program Files\Microsoft SQL Server\150\Tools\Binn\
C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\
    C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\vcruntime140_cor3.dll => 14.29.30139.0 built by: vcwrkspc
C:\Program Files (x86)\WiX Toolset v3.14\bin
C:\Program Files\Microsoft SQL Server\130\DTS\Binn\
C:\Program Files\Microsoft SQL Server\140\DTS\Binn\
    ==> non existent
C:\Program Files\Microsoft SQL Server\150\DTS\Binn\
    ==> non existent
C:\Program Files\Microsoft SQL Server\160\DTS\Binn\
    ==> non existent
C:\Strawberry\c\bin
C:\Strawberry\perl\site\bin
C:\Strawberry\perl\bin
C:\ProgramData\chocolatey\lib\pulumi\tools\Pulumi\bin
C:\Program Files\TortoiseSVN\bin
C:\Program Files\CMake\bin
C:\ProgramData\chocolatey\lib\maven\apache-maven-3.8.7\bin
C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\msvcp110.dll => 11.00.65501.17010 built by: WINCCOMP(CGTBUILD13-VCWRKSPC)
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\msvcp140_1.dll => 14.25.28508.3 built by: vcwrkspc
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\msvcp140_2.dll => 14.25.28508.3 built by: vcwrkspc
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\msvcp140.dll => 14.00.23026.0 built by: WCSETUP
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\msvcr110.dll => 11.00.65501.17010 built by: WINCCOMP(CGTBUILD13-VCWRKSPC)
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\vcruntime140_1.dll => 14.25.28508.3 built by: vcwrkspc
    C:\Program Files\Microsoft Service Fabric\bin\Fabric\Fabric.Code\vcruntime140.dll => 14.00.23026.0 built by: WCSETUP
C:\Program Files\Microsoft SDKs\Service Fabric\Tools\ServiceFabricLocalClusterManager
C:\Program Files\nodejs\
C:\Program Files\Git\cmd
C:\Program Files\Git\mingw64\bin
C:\Program Files\Git\usr\bin
C:\Program Files\GitHub CLI\
c:\tools\php
C:\Program Files (x86)\sbt\bin
C:\Program Files\Amazon\AWSCLIV2\
    C:\Program Files\Amazon\AWSCLIV2\VCRUNTIME140.dll => 14.38.33126.1
C:\Program Files\Amazon\SessionManagerPlugin\bin\
C:\Program Files\Amazon\AWSSAMCLI\bin\
C:\Program Files\Microsoft SQL Server\130\Tools\Binn\
C:\Program Files\LLVM\bin
    C:\Program Files\LLVM\bin\msvcp140_1.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\msvcp140_2.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\msvcp140_atomic_wait.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\msvcp140_codecvt_ids.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\msvcp140.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\vcruntime140_1.dll => 14.29.30139.0 built by: vcwrkspc
    C:\Program Files\LLVM\bin\vcruntime140.dll => 14.29.30139.0 built by: vcwrkspc
C:\Users\runneradmin\.dotnet\tools
C:\Users\runneradmin\.cargo\bin
C:\Users\runneradmin\AppData\Local\Microsoft\WindowsApps

This is a bit of a mess. There is no real confidence in which MSVC runtime will be used.

The result was obtained in a workflow with this command:

foreach ($d in ($env:Path -split ';')) {
    Write-Output "$d"
    if (-not -not $d) {
        if (Test-Path -PathType Container $d) {
            Get-ChildItem "$d\*" -Include @("vcruntime*.dll", "msvc*.dll") |
                ForEach-Object { (Get-Command $_).FileVersionInfo } |
                ForEach-Object { Write-Output "    $($_.FileName) => $($_.FileVersion)" }
        }
        else {
            Write-Output "    ==> non existent"
        }
    }
}
MarkCallow commented 2 months ago

Although I agree that the Github team should probably compile some non-trivial software with their runner images before deploying,

Compilation isn't the issue here. Running the compiled software is. They need to build and run some non-trivial software. (In my case the crashes happen whenever the applications that are being tested use std:mutex. If the test takes the app through a path that does not use std::mutex then it runs fine.)

a-zakir commented 2 months ago

Github what have you done!? We’re facing ugly seg faults https://github.com/AntaresSimulatorTeam/Antares_Simulator/actions/runs/9414209107/job/25932593109

a-zakir commented 2 months ago

Although I agree that the Github team should probably compile some non-trivial software with their runner images before deploying,

Compilation isn't the issue here. Running the compiled software is. They need to build and run some non-trivial software. (In my case the crashes happen whenever the applications that are being tested use std:mutex. If the test takes the app through a path that does not use std::mutex then it runs fine.)

This is a weak assumption, what make you think that « they »are not Sufficiently test mutex ?

FangCunWuChang commented 2 months ago

I hit this too, then protoc returned 0xc0000142

agarny commented 2 months ago

Also affected by this issue and, like some others here, the /D_DISABLE_CONSTEXPR_MUTEX_CONSTRUCTOR workaround solved it for me. What a waste of time though! :exploding_head:

MarkCallow commented 2 months ago

Compilation isn't the issue here. Running the compiled software is. They need to build and run some non-trivial software. (In my case the crashes happen whenever the applications that are being tested use std:mutex. If the test takes the app through a path that does not use std::mutex then it runs fine.)

This is a weak assumption, what make you think that « they »are not Sufficiently test mutex ?

I didn't claim they are insufficiently testing mutex. I simply said that in my case that is where the crash is happening. I do claim that they are insufficiently testing running software built with and running in the new runner environment.

Whatever the cause, the point is that a major breakage made it through their test and release system all the way to the production runner image without being detected. This is why I am requesting they conduct a post-mortem so they can figure out why and put in place steps to prevent it happening again.

shamil-mubarakshin commented 2 months ago

Hello, we have identified a possible cause of DLL's version mismatch. TortoiseSVN tool when installed after VStudio overwrites files such as vcruntime140.dll and MSVCP140.dll under c:\Windows\system32. With PR DLLs are of the latest version which I have tested on two projects:

  1. It fixes @ScottTodd 's segfaults when run from https://github.com/iree-org/iree/commit/29472a1637146514c2781c3befca34939c6e7480
  2. Partially fixes @llvm-beanz's build issues https://github.com/microsoft/DirectXShaderCompiler. Release stage completes, Debug stage ends in warning C4201: nonstandard extension used: nameless struct/union treated as an error

We will start deployment with mentioned fix this week