GoogleCloudPlatform / kms-integrations

https://cloud.google.com/kms
Apache License 2.0
39 stars 13 forks source link

Some signtool versions (eg. 10.0.22621.0) occasionally fail or never return #19

Open obones opened 1 year ago

obones commented 1 year ago

Hello,

I have been using the CNG provider (version 0.8) for the past month and it's really nice to be able to continue using signtool in our build environment. What I have noticed though, it that from times to times, signtool misbehaves with the following two potential outcomes:

  1. exit with an exit code different from 0 but no error message whatsoever
  2. use 1 CPU core forever, never exiting.

This was never happening before we moved to the CNG provider. I have tried putting calls to signtool in a batch loop but have not been able to reproduce reliably.

I'm still investigating the situation, but if this rings a bell with anyone, I'd be quite relieved to learn that I'm not the only one seeing this.

obones commented 1 year ago

I just had a failure where signtool returned -1073740940 which apparently is the value defined for STATUS_HEAP_CORRUPTION

This could explain the "never ending loop" I'm seeing in some cases.

ysichrisdag commented 1 year ago

I have also been experiencing this problem seemingly randomly. Once it starts happening it will be consistent for minutes/hours and then start working correctly again for no discernable reason. The really frustrating part is that log output from signtool indicates that the file was successfully signed with no warnings or errors so this appears to occur in the exit routines.

Faulting application name: signtool.exe, version: 10.0.18362.1, time stamp: 0x2d8c3e38 Faulting module name: ntdll.dll, version: 10.0.14393.5980, time stamp: 0x6459ba02 Exception code: 0xc0000374 Fault offset: 0x00000000000f7183 Faulting process id: 0x17d0 Faulting application start time: 0x01d9fc51b2a236ba Faulting application path: C:\Program Files (x86)\Windows Kits\10\bin\10.0.18362.0\x64\signtool.exe Faulting module path: C:\Windows\SYSTEM32\ntdll.dll Report Id: 10710708-27e3-4908-9451-702b0da8fdb5 Faulting package full name: Faulting package-relative application ID:

ysichrisdag commented 1 year ago

I enabled grpc trace logging with GRPC_TRACE=all and didn't really get any useful information except that grpc completes it's shutdown routing BEFORE the application crashes. This is the end of the output before the crash details like the eventlog above.

I1011 15:16:45.019659 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:397]: EXECUTOR Executor::ShutdownAll() enter I1011 15:16:45.019708 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:142]: EXECUTOR (default-executor) SetThreading(0) begin I1011 15:16:45.019782 8912 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:234]: EXECUTOR (default-executor) [0]: shutdown I1011 15:16:45.019886 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:188]: EXECUTOR (default-executor) Thread 1 of 1 joined I1011 15:16:45.019975 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:209]: EXECUTOR (default-executor) SetThreading(0) done I1011 15:16:45.020036 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:142]: EXECUTOR (resolver-executor) SetThreading(0) begin I1011 15:16:45.020119 2420 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:234]: EXECUTOR (resolver-executor) [0]: shutdown I1011 15:16:45.020205 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:188]: EXECUTOR (resolver-executor) Thread 1 of 1 joined I1011 15:16:45.020271 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:209]: EXECUTOR (resolver-executor) SetThreading(0) done I1011 15:16:45.020338 4472 logging.cc:48] [external/com_github_grpc_grpc/src/core/lib/iomgr/executor.cc:426]: EXECUTOR Executor::ShutdownAll() done

ysichrisdag commented 1 year ago

As a temporary work around we have confirmed that signtool.exe from the Windows 8.1 SDK does not have this problem, however signtool from all versions i've tried of Windows 10 SDK do have this issue.

bdhess commented 1 year ago

I was able to reproduce this and get a stack trace using debugdiag.

DetailID = 1
    Count:    1
    Exception #:  0X80000003
    Stack:        
        ntdll!RtlIsZeroMemory+0xa2
        ntdll!_misaligned_access+0x41a
        ntdll!_misaligned_access+0x6fa
        ntdll!_misaligned_access+0xad79
        ntdll!_misaligned_access+0x3a0
        ntdll!RtlEnterCriticalSection+0xcf4
        ntdll!RtlGetCurrentServiceSessionId+0xbf0
        ntdll!RtlFreeHeap+0x51
        ncrypt!NCryptFreeObject+0x4cb
        CRYPT32!I_CertWnfEnableFlushCache+0xef5a
        CRYPT32!CertCloseStore+0xa3
        signtool+0x34bee
        signtool+0x262c1
        signtool+0x28e35
        signtool+0x2f966
        signtool+0x426f1
        KERNEL32!BaseThreadInitThunk+0x10
        ntdll!RtlUserThreadStart+0x2b

Notably, kmscng isn't in the stack trace. I think signtool has a bug on the code path that is used to close the certificate that was loaded from the filesystem. :-(

obones commented 1 year ago

Thanks for the call stack. The absence of kmscng in the stack trace does not necessarily mean it's not the culprit. I mean, it could trash the heap with a buffer overflow and still be absent from the stack trace when memory is released as seems to be the case. As to why it would trash the heap and/or stack, it's anyone's guess but I would go for either a badly sized array (off by 1) or a platform dependent structure that is wrongly declared. The latter has my preference because the x64 version of signtool is so seldom used, I would not be surprised it's not as field tested as the x86 one.

bdhess commented 1 year ago

@obones I don't think your theory is impossible, but a bug in signtool feels more likely to me.

We're looking into asan/msan builds on Windows so that we can determine this more conclusively.

davoustp commented 10 months ago

Hi @bdhess, hi @obones, hi @ysichrisdag, We do experience the exact same symptom (x64 signtool.exe hanging without any obvious reason) on various systems (10+ different build servers / CI) with CNG provider v1.0 and Windows 11 SDK version 10.0.22621.0. Reverting to x64 signtool.exe version 6.3.9600.17298 from Windows 8.1 SDK (from https://developer.microsoft.com/en-us/windows/downloads/sdk-archive/ ) proved successful, no issue detected and signing works like a charm. Anything we could do to help narrowing the problem down?

bbamsch commented 5 months ago

With signtool.exe from the latest Windows 11 SDK version (10.0.26100.1), I am no longer able to reproduce this issue when using the kmscng library to perform remote signing via Cloud KMS even over a large number of iterations.

If you are affected by this, I would suggest upgrading to the Windows 11 SDK version 10.0.26100.1 or later. I tested with the Windows SDK Build Tools package available via NuGet: https://www.nuget.org/packages/Microsoft.Windows.SDK.BuildTools/10.0.26100.1