dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.95k stars 4.65k forks source link

Curl exception: Problem with the SSL CA cert (path? access rights?) on centos, fedora, redhat. #20237

Closed danmoseley closed 4 years ago

danmoseley commented 7 years ago

https://ci.dot.net/job/dotnet_corefx/job/master/job/outerloop_portablelinux_debug/lastCompletedBuild/testReport/System.Net.Tests/HttpWebRequestHeaderTest/GetResponse_UseDefaultCredentials_ExpectSuccess_remoteServer__https___corefx_net_cloudapp_net_Echo_ashx_/

MESSAGE:
System.Net.WebException : An error occurred while sending the request. Problem with the SSL CA cert (path? access rights?)\n---- System.Net.Http.HttpRequestException : An error occurred while sending the request.\n-------- System.Net.Http.CurlException : Problem with the SSL CA cert (path? access rights?)
+++++++++++++++++++
STACK TRACE:
at System.Net.HttpWebRequest.GetResponse() in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Requests/src/System/Net/HttpWebRequest.cs:line 989 at System.Net.Tests.HttpWebRequestHeaderTest.GetResponse_UseDefaultCredentials_ExpectSuccess(Uri remoteServer) in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Requests/tests/HttpWebRequestHeaderTest.cs:line 46 ----- Inner Stack Trace ----- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() at System.Net.Http.HttpClient.<FinishSendAsyncUnbuffered>d__59.MoveNext() in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Http/src/System/Net/Http/HttpClient.cs:line 487 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() at System.Net.HttpWebRequest.<SendRequest>d__188.MoveNext() in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Requests/src/System/Net/HttpWebRequest.cs:line 1192 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult() at System.Net.HttpWebRequest.GetResponse() in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Requests/src/System/Net/HttpWebRequest.cs:line 985 ----- Inner Stack Trace ----- at System.Net.Http.CurlHandler.ThrowIfCURLEError(CURLcode error) in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Http/src/System/Net/Http/Unix/CurlHandler.cs:line 640 at System.Net.Http.CurlHandler.MultiAgent.FinishRequest(StrongToWeakReference`1 easyWrapper, CURLcode messageResult) in /mnt/resource/j/workspace/dotnet_corefx/master/outerloop_portablelinux_debug/src/System.Net.Http/src/System/Net/Http/Unix/CurlHandler.MultiAgent.cs:line 852

@steveharter @Priya91 please either fix or (presumably) disable today, so we can get a green badge.

Priya91 commented 7 years ago

Been reading curl docs around this failure, and since we don't support ssl scenarios with nss as backend, one option will be to point the cert database to an empty dir location with SSL_DIR env variable, then it should bypass the nss db load initialization here

stephentoub commented 7 years ago

since we don't support ssl scenarios with nss as backend

We do. We just don't support customizing them, e.g. using a cert validation callback, enabling revocation checking, etc. But you can certainly use HttpClient with such a backend to connect to https endpoints.

tmds commented 7 years ago

@karelz @Priya91 @stephentoub Users running on rhel should use the .NET Core packages of rhel which includes a newer version of libcurl and is linked with openssl.

I've provided some libcurl builds that provide debug info which should help find the root cause of this issue. I don't think these are installed on a CI machine.

Perhaps this issue should no longer target 2.0?

karelz commented 7 years ago

This issue is top test failure, making our Networking test suite unreliable. That's why we are keeping it in 2.0. I would really like it to see addressed (somehow) in 2.0, to lower test-failure noise in 2.x servicing branches.

tmds commented 7 years ago

somehow:

karelz commented 7 years ago

Disable test is not feasible, because it shows up in random tests all over.

I am waiting on @Priya91's recommendation ...

Priya91 commented 7 years ago

@tmds You mention that Users running on rhel should use the .NET Core packages of rhel which includes a newer version of libcurl and is linked with openssl?

Does that mean that on installing a rhel image, the default curl install will now be built with openssl? Also, how do we update the CI machines to get this libcurl version, it would be ideal if we can get it from scl, and not install a custom build, in which case later re-imaging will be difficult. We have the same issue on centos and fedora, would the same upgrade work there as well?

tmds commented 7 years ago

@Priya91 On RHEL the default version of curl is an 'old' one and it stays that way to keep compatibility. If you install dotnet on RHEL (per https://www.microsoft.com/net/core#linuxredhat), this will include a more recent version of libcurl. This libcurl does not replace the system version. It is only used when doing dotnet stuff. It gets used because when you do scl enable rh-dotnetcoreXX bash this sets the LD_LIBRARY_PATH envvar which makes it find the more recent libcurl.

You can use this libcurl by installing the dotnet scl and setting the LD_LIBRARY_PATH (to match the envvar when the scl is enabled).

CentOS: there is no dotnet package yet. We will work on this after shipping 2.0 for RHEL. It will work in the same manner as the RHEL package. Fedora: If this issue does only happen on Fedora 24 (and not on Fedora 25), it's safe to ignore it. 24 is EOL in September 2017.

karelz commented 7 years ago

@Priya91 what is your recommendation? What are the next steps?

Priya91 commented 7 years ago

I'm gonna first try setting the SSL_DIR env variable to an empty dir on the initialization script as it's easier to set this up than re-imaging.

Priya91 commented 7 years ago

Made the change to provide empty cert dir with SSL_DIR env variable. Closing as fixed, please re-open if failures reappear.

karelz commented 7 years ago

Great, thanks @Priya91!

v-haren commented 7 years ago

failed again in ci, detail: https://ci.dot.net/job/dotnet_corefx/job/master/job/outerloop_netcoreapp_centos7.1_release/45/testReport/System.Net.Http.Functional.Tests/HttpClientHandler_SslProtocols_Test/GetAsync_SupportedSSLVersion_Succeeds_sslProtocols__Tls__url____https___www_ssllabs_com_10301____/

KristinXie1 commented 7 years ago

This issue is repro on Portable Core Tests in build 20170517.01, detail: https://mc.dot.net/#/product/netcore/master/source/official~2Fcorefx~2Fmaster~2F/type/test~2Ffunctional~2Fportable~2Fcli~2F/build/20170517.01/workItem/System.Net.Http.Functional.Tests/analysis/xunit/System.Net.Http.Functional.Tests.HttpClientHandler_ServerCertificates_Test~2FNoCallback_RevokedCertificate_NoRevocationChecking_Succeeds

karelz commented 7 years ago

The one CI failure is surprising - likely due to stale VM which didn't get the workaround yet. The MC failure is the first one in a long time. If we see more instances of that, we should create a new bug.

sharok commented 7 years ago

@karelz I have the same issue in my project and I can't make any https request from my .NET Core (1.1.1) application. I face with it when send email with Amazon SDK. https://github.com/aws/aws-sdk-net/issues/662

Is there any workaround?

tmds commented 7 years ago

@sharok so far this issue was only observed on CentOS, Fedora and RHEL. It seems you are hitting this on Ubuntu 14.04. Is that right? Can you check the version of libcurl? Do you have a minimal repro?

sharok commented 7 years ago

@tmds Yes, I use Ubuntu 14.04. I don't have repro, but you can try simple asp.net core application and put this code in a controller:

using (var client = new HttpClient())
{
    var contents = await client.GetStringAsync("https://httpbin.org/get");    
}

This is the output of curl --version command:

curl 7.35.0 (x86_64-pc-linux-gnu) libcurl/7.35.0 OpenSSL/1.0.1f zlib/1.2.8 libidn/1.28 librtmp/2.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP

In the ticket that I mentioned above you can find info about environment. Btw, the curl itself works fine.

karelz commented 7 years ago

@sharok what is the exact failure you're getting with the code above?

sharok commented 7 years ago

@karelz This is a full stack trace:

2017-05-31T03:59:15.6178500+00:00 0HL57R5JN7J7L [ERR] An error occurred while sending the request. (37785cd4)
System.AggregateException: One or more errors occurred. (An error occurred while sending the request.) ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.CurlException: Problem with the SSL CA cert (path? access rights?)
   at System.Net.Http.CurlHandler.ThrowIfCURLEError(CURLcode error)
   at System.Net.Http.CurlHandler.MultiAgent.FinishRequest(StrongToWeakReference`1 easyWrapper, CURLcode messageResult)
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsync>d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<GetContentAsync>d__32`1.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at log.elmahbucket.io.Controllers.WhoamiController.Get() in C:\BuildAgent\work\d2f849282c12398d\src\log.elmahbucket.io\Controllers\WhoamiController.cs:line 33
   at lambda_method(Closure , Object , Object[] )
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.<InvokeActionMethodAsync>d__27.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.<InvokeNextActionFilterAsync>d__25.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Rethrow(ActionExecutedContext context)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.<InvokeNextExceptionFilterAsync>d__24.MoveNext()
---> (Inner Exception #0) System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.Http.CurlException: Problem with the SSL CA cert (path? access rights?)
   at System.Net.Http.CurlHandler.ThrowIfCURLEError(CURLcode error)
   at System.Net.Http.CurlHandler.MultiAgent.FinishRequest(StrongToWeakReference`1 easyWrapper, CURLcode messageResult)
   --- End of inner exception stack trace ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<FinishSendAsync>d__58.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
   at System.Net.Http.HttpClient.<GetContentAsync>d__32`1.MoveNext()<---
karelz commented 7 years ago

Does it repro 100%? Can you try it on .NET Core 2.0 Preview 1?

sharok commented 7 years ago

@karelz It's a production machine, and I don't want to break something, so I will create a new one later and will try on Preview 1.

karelz commented 7 years ago

Great! Let us know how it goes. I doubt it will make the problem to go away, but it will be easier to focus on solution if we have confirmed 100% repro on 2.0 Preview 1. Thanks!

Priya91 commented 7 years ago

@sharok It could be a number of reasons, if your certs got updated recently, and the cert store was not updated it can result to this error, try running sudo update-ca-certificates. We have not observed this error with openssl crypto backend, if you can, set the env variable CURLHANDLER_DEBUG_VERBOSE=true. You should get some diagnostic data that can help with debugging.

jaredrsowers commented 7 years ago

We are seeing this same issue in 1.1.2 running on CentOS... We are randomly getting the SSL failures on a production box. So, if your fix only is targeted at "fixing your tests" and not "fixing the coreclr/curl SSL issue" then I think you are being too narrow minded. For example, installing a custom curl on your build boxes and not documenting and publishing it as a workaround that others could do would just be masking the issue, while on the other hand, bundling curl with coreclr would be a more robust "workaround"... Whatever the outcome is, it will be needed on production boxes, and not just on build/test servers...

Warr1024 commented 7 years ago

The issue seems to be concurrency-related, and affects arbitrary .NET Core applications. With this program, I'm able to reproduce the problem about 25% of the time, using a 4-core CentOS 7 VM.

Decreasing the number of threads seems to make the problem happen less frequently, with the problem not reproducible at all with only one thread. Increasing the number of threads did not significantly increase probability.

using System;
using System.Linq;
using System.Net;
using System.Threading;

namespace ConsoleApp2
{
    class Program
    {
        const string URI = "https://example.com";
        static int Main(string[] args)
        {
            int Result = 0;
            var Workers = Enumerable.Range(0, 4)
                .Select(I => new Thread(() =>
                {
                    try
                    {
                        var Req = WebRequest.CreateHttp(URI);
                        Req.Method = "HEAD";
                        using ( Req.GetResponseAsync().Result ) ;
                    }
                    catch ( Exception ex )
                    {
                        Console.WriteLine($"{I}: {ex}");
                        Result = 1;
                    }
                })
                { IsBackground = true })
                .ToArray();
            foreach ( var T in Workers )
                T.Start();
            foreach ( var T in Workers )
                T.Join();
            return Result;
        }
    }
}

System/version information:

# uname -a
Linux frqaclapp01.expoexchange.com 3.10.0-514.26.2.el7.x86_64 dotnet/corefx#1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# dotnet --version
1.0.4

# curl --version
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets

# rpm -qa | perl -ne 'm/curl|nss\b/&&print'
curl-7.29.0-35.el7.centos.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
nss-softokn-3.16.2.3-14.4.el7.x86_64
nss-sysinit-3.28.4-1.2.el7_3.x86_64
nss-tools-3.28.4-1.2.el7_3.x86_64
libcurl-7.29.0-35.el7.centos.x86_64
python-pycurl-7.19.0-19.el7.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.i686
nss-util-3.28.4-1.0.el7_3.x86_64
nss-3.28.4-1.2.el7_3.x86_64

[EDIT] Add C# syntax highlight by @karelz

jmacnett commented 7 years ago

The results @Warr1024 found are also reproducible in 2.0 preview 2, running within a centos 7 docker container. The ratio of errors to success jumps up to about 50% from 25% over an extended period.

System details:

# uname -a
Linux 5fb57c34ad68 4.9.36-moby dotnet/corefx#1 SMP Wed Jul 12 15:29:07 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

# dotnet --version
2.0.0-preview2-006497

# curl --version
curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp
Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets

# rpm -qa | perl -ne 'm/curl|nss\b/&&print'
nss-softokn-3.16.2.3-14.4.el7.x86_64
nss-sysinit-3.28.4-1.2.el7_3.x86_64
libcurl-7.29.0-35.el7.centos.x86_64
python-pycurl-7.19.0-19.el7.x86_64
nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64
nss-util-3.28.4-1.0.el7_3.x86_64
nss-3.28.4-1.2.el7_3.x86_64
nss-tools-3.28.4-1.2.el7_3.x86_64
curl-7.29.0-35.el7.centos.x86_64
psychohamster commented 7 years ago

I'm having the issue using 1.1 on CentOS 7, and I'm having trouble trying the workaround as suggested by @Priya91 and @tmds. I can set the LD_LIBRARY_PATH env var with a path that contains a libcurl built against openssl, and have confirmed in the code it's getting it with an Environment.GetEnvironmentVariable("LD_LIBRARY_PATH") call dumping to the console. And yet, when I enable event tracing I can see the debug message from curlhandler https://github.com/dotnet/corefx/blob/108260a51b52d40c848c99dd903e2e1de4d9eb62/src/System.Net.Http/src/System/Net/Http/Unix/CurlHandler.cs#L172 confirming it's getting the NSS based one

Does the path variable override only work with the 2.0 runtime?

karelz commented 7 years ago

@jaredrsowers

bundling curl with coreclr would be a more robust "workaround"

While you are correct it also comes with huge burden and cost - see https://github.com/dotnet/corefx/issues/16201#issuecomment-288769690. We considered this approach (also from other reasons), but rejected it - see dotnet/corefx#17647.

@tmds @Priya91 can we document clearly how to workaround this problem on older CentOS versions? If we see more customers hitting the problem, we should probably mention it in release notes or supported OS doc

jaredrsowers commented 7 years ago

@karelz I am unclear what you mean by

older CentOS versions

This is an easily reproducible issue on the latest, fully patched, CentOS. And, I agree, it would be fantastic to get a clearly documented "workaround" for this.

karelz commented 7 years ago

I am unclear what you mean by

I mixed it up a bit, sorry (too long thread and lack of my expertise on Linux combined) - it is not about older CentOS, it is about CentOS not having .NET Core 2.0 package yet, per @tmds comment here: https://github.com/dotnet/corefx/issues/16201#issuecomment-300264672 The version of curl in CentOS distro is just not working well with .NET Core as I understand it.

jaredrsowers commented 7 years ago

I think the perfect "workaround" for CentOS would either be distribute this "more recent version of libcurl" that is installed on RHEL so that we could then set the LD_LIBRARY_PATH for. Or if that isn't possible, outline the steps taken to properly compile it and generate it ourselves...

karelz commented 7 years ago

outline the steps taken to properly compile it and generate it ourselves.

That sounds very reasonable - I am waiting on @tmds or @Priya91 to provide more details here ...

Priya91 commented 7 years ago

You can either do,

  1. Use env variable SSL_DIR to an empty dir without certs, in which case the nss initialization will not use the default location of /etc/pki/nssdb here, which may contain out of date certs resulting in CURLE_SSL_CACERT_BADFILE, which is the workaround that was used to unblock CI.

  2. Use custom curl install built with openssl. There is documentation on curl site on how to build a custom curl with openssl crypto backend.

  3. If this problem occurred with openssl as backend, do a sudo update-ca-certificates, this is because there may be updates to the certs in /usr/share/ca-certificates that were not notified to openssl cert store location usually /etc/ssl/certs.

jmacnett commented 7 years ago

For what it's worth, the only way we found to 100% eliminate this issue was to throw a lock around the web call in c#, although we also had success with a retry-on-fail try-catch block.

tmds commented 7 years ago

Red Hat's build of .NET Core comes with a newer version of libcurl. This doesn't replace the system version (for which they are strong API/ABI garantees). For CentOS we need the same. @Priya91 listed the available options until there is a CentOS build available.

jaredrsowers commented 6 years ago

@tmds, I just noticed that Microsoft is now distributing CentOS builds through RPM packages for the install of dotnet core 2 (https://www.microsoft.com/net/learn/get-started/linuxcentos), do these packages include an updated libcurl, or does the issue still exist?

Red Hat's build of .NET Core comes with a newer version of libcurl. This doesn't replace the system version (for which they are strong API/ABI garantees). For CentOS we need the same. @Priya91 listed the available options until there is a CentOS build available.

omajid commented 6 years ago

do these packages include an updated libcurl

AKAIK, no.

But you can use the RHEL steps on CentOS 7 too: https://stackoverflow.com/a/46733657/3561275

jaredrsowers commented 6 years ago

But you can use the RHEL steps on CentOS 7 too: https://stackoverflow.com/a/46733657/3561275

Well whaddya know.... I think that'll work! Thanks!