Azure / azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
MIT License
731 stars 489 forks source link

DisableServerCertificateValidation=true results in SSL exception with linux emulator #4315

Open emyklebost opened 7 months ago

emyklebost commented 7 months ago

I was happy when I discovered that this feature was added to the SDK. However, I still get The SSL connection could not be established exception when using the emulator. Am I misunderstanding this feature?

Example test:

[Fact]
public async Task Test()
{
    await using var testContainer = new CosmosDbBuilder().WithPortBinding(8081, 8081).Build();
    await testContainer.StartAsync();

    var connectionString = $"{testContainer.GetConnectionString()};DisableServerCertificateValidation=true";

    using var client = new CosmosClientBuilder(connectionString).Build();
    var result = await client!.CreateDatabaseIfNotExistsAsync("database-1"); // throws HttpRequestException due to SSL
}

Exception:

  Message: 
System.Net.Http.HttpRequestException : The SSL connection could not be established, see inner exception.
---- System.IO.IOException : Received an unexpected EOF or 0 bytes from the transport stream.

  Stack Trace: 
ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken)
HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
CosmosHttpClientCore.ExecuteHttpHelperAsync(HttpRequestMessage requestMessage, ResourceType resourceType, CancellationToken cancellationToken)
CosmosHttpClientCore.SendHttpHelperAsync(Func`1 createRequestMessageAsync, ResourceType resourceType, HttpTimeoutPolicy timeoutPolicy, IClientSideRequestStatistics clientSideRequestStatistics, CancellationToken cancellationToken)
<19 more frames...>
ClientContextCore.RunWithDiagnosticsHelperAsync[TResult](String containerName, String databaseName, OperationType operationType, ITrace trace, Func`2 task, Func`2 openTelemetry, String operationName, RequestOptions requestOptions)
<<OperationHelperWithRootTraceWithSynchronizationContextAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
UnitTest1.Test() line 18

Source code

sourabh1007 commented 7 months ago

are you using custom HttpClientFactory? in that case it would override this setting.

kirankumarkolli commented 7 months ago

@emyklebost can you please update great to 3.38.1 and validate?

3.38.0 has an issue and its addressed in 3.39.1 (ref: https://github.com/Azure/azure-cosmos-dotnet-v3/pull/4294)

emyklebost commented 7 months ago

@sourabh1007 No, not using any custom setup. The entire setup is included in the post, which also includes a .zip with a Solution to reproduce the issue.

@kirankumarkolli Tried to update to 3.38.1 and 3.39.0-preview.1, same issue.

kirankumarkolli commented 7 months ago

@sourabh1007 can you please try the repro?

kirankumarkolli commented 6 months ago

Debugged it offline together with @sourabh1007.

System.IO.IOException : Received an unexpected EOF or 0 bytes from the transport stream This happening because the emulator is not yet up/ready. Its an issue with the testcontainer package. Once the emulator is ready CosmosClient is able to connect to it successfully. Till a fix from emulator container one possible work-around is to delay the test/validation as possible work-around,

using var client = new CosmosClientBuilder(connectionString).Build();

Testcontainer set-up is networksettings section with different addressed.

"NetworkSettings": {
        "Bridge": "",
        "SandboxID": "0d639f30a09443c08d4e3c5c359fcec3a2b36eba423494f01c41fcf503ec031e",
        "SandboxKey": "/var/run/docker/netns/0d639f30a094",
        "Ports": {
            "8081/tcp": [
                {
                    "HostIp": "0.0.0.0",
                    "HostPort": "8081"
                }
            ]
        },
        "HairpinMode": false,
        "LinkLocalIPv6Address": "",
        "LinkLocalIPv6PrefixLen": 0,
        "SecondaryIPAddresses": null,
        "SecondaryIPv6Addresses": null,
        "EndpointID": "e0ff54ce39c6b2ee4fb8d6ba89d1cd97afc2786777c8a1ba92b9d447c2fa6a77",
        "Gateway": "172.17.0.1",
        "GlobalIPv6Address": "",
        "GlobalIPv6PrefixLen": 0,
        **"IPAddress": "172.17.0.3"**,
        "IPPrefixLen": 16,
        "IPv6Gateway": "",
        "MacAddress": "02:42:ac:11:00:03",
        "Networks": {
            "bridge": {
                "IPAMConfig": null,
                "Links": null,
                "Aliases": null,
                "MacAddress": "02:42:ac:11:00:03",
                "NetworkID": "c74c36703258a6c76044f5f834f4197513168486f525d1d2a97598f29b44ccf1",
                "EndpointID": "e0ff54ce39c6b2ee4fb8d6ba89d1cd97afc2786777c8a1ba92b9d447c2fa6a77",
                "Gateway": "172.17.0.1",
                **"IPAddress": "172.17.0.3"**,
                "IPPrefixLen": 16,
                "IPv6Gateway": "",
                "GlobalIPv6Address": "",
                "GlobalIPv6PrefixLen": 0,
                "DriverOpts": null,
                "DNSNames": null
            }
        }
    }

It resulting in SDK trying to access over "172.17.0.3" and failing. One work-around for now is to limit to the global endpoint with below changes to the code.

using var client = new CosmosClientBuilder(connectionString).WithLimitToEndpoint(true).Build();
Chris-Greaves commented 6 months ago

Still having this issue with LimitToEndpoint = true.

Cutdown version of my test logic:

using Microsoft.Azure.Cosmos.Fluent;
using Testcontainers.CosmosDb;

namespace Project.Tests;

public class CosmosDBContainerTest
{
    private readonly CosmosDbContainer _cosmosDBContainer = new CosmosDbBuilder().Build();

    [OneTimeSetUp]
    public Task OneTimeSetup()
    {
        return Task.WhenAll([
            _cosmosDBContainer.StartAsync()
        ]);
    }

    [OneTimeTearDown]
    public Task OneTimeTearDown()
    {
        return Task.WhenAll([
            _cosmosDBContainer.DisposeAsync().AsTask()
        ]);
    }

    [Test]
    public async Task TestContainerIsCallable()
    {
        var connectionString = $"{_cosmosDBContainer.GetConnectionString()};DisableServerCertificateValidation=true";

        using var client = new CosmosClientBuilder(connectionString).WithLimitToEndpoint(true).Build();
        var result = await client!.CreateDatabaseIfNotExistsAsync("database-1"); // throws HttpRequestException due to SSL
    }
}

My stack trace:

GlobalEndpointManager: Fail to reach gateway endpoint https://127.0.0.1:63468/, System.Net.Http.HttpRequestException: The SSL connection could not be established, see inner exception.
 ---> System.IO.IOException: Received an unexpected EOF or 0 bytes from the transport stream.
   at System.Net.Security.SslStream.ReceiveHandshakeFrameAsync[TIOAdapter](CancellationToken cancellationToken)
   at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](Boolean receiveFirst, Byte[] reAuthenticationData, CancellationToken cancellationToken)
   at System.Net.Security.SslStream.ProcessAuthenticationWithTelemetryAsync(Boolean isAsync, CancellationToken cancellationToken)
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.EstablishSslConnectionAsync(SslClientAuthenticationOptions sslOptions, HttpRequestMessage request, Boolean async, Stream stream, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(QueueItem queueItem)
   at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.HttpConnectionWaiter`1.WaitForConnectionWithTelemetryAsync(HttpRequestMessage request, HttpConnectionPool pool, Boolean async, CancellationToken requestCancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at Microsoft.Azure.Cosmos.CosmosHttpClientCore.ExecuteHttpHelperAsync(HttpRequestMessage requestMessage, ResourceType resourceType, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.CosmosHttpClientCore.SendHttpHelperAsync(Func`1 createRequestMessageAsync, ResourceType resourceType, HttpTimeoutPolicy timeoutPolicy, IClientSideRequestStatistics clientSideRequestStatistics, CancellationToken cancellationToken)
   at Microsoft.Azure.Cosmos.GatewayAccountReader.GetDatabaseAccountAsync(Uri serviceEndpoint)
   at Microsoft.Azure.Cosmos.Routing.GlobalEndpointManager.GetAccountPropertiesHelper.GetAndUpdateAccountPropertiesAsync(Uri endpoint)
Chris-Greaves commented 6 months ago

So I got it to work, here's everything that I needed to do:

The CosmosDB Emulator image takes FOREVER to start!

I added a Thread.Sleep(180000) at the beginning of my test to ensure it has enough time to start before running the tests. The WaitUntil being used before was misleading and didn't accurately reflect when the service was up and ready to be used. (See https://github.com/testcontainers/testcontainers-dotnet/issues/1107). Unfortunately the code fix hasn't made it to a public release of the library yet, so you can either implement this manually, or just stick a Sleep in until the fix is released.

There are Environment Variables that are needed to get the Emulator working on Linux hosts

The Two Environment Variables you need are AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE and AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE.

The IP address override is required to stop the emulator telling the client to call the docker internal IP address. If you want the test to interact with the DB directly, then you'll want to set this to 127.0.0.1, but possibly for CI pipelines or as a dependency for another test container, this might need tweaking.

Data Persistence is needed to stop the SDK from hanging.

See https://github.com/MicrosoftDocs/azure-docs/issues/95589 for the original issue.

Bonus Environment Variable, you can reduce the number of partitions

AZURE_COSMOS_EMULATOR_PARTITION_COUNT can be used to lower the number of partitions, which should speed up loading times.

Not all the ports required are opened by default.

CosmosDB Client will use the ports 10250-10255 for connections to the DB, if these aren't open the SDK will fail to action anything.

TL;DR

    private readonly CosmosDbContainer _cosmosDBContainer = new CosmosDbBuilder()
        .WithEnvironment("AZURE_COSMOS_EMULATOR_PARTITION_COUNT", "3")
        .WithEnvironment("AZURE_COSMOS_EMULATOR_ENABLE_DATA_PERSISTENCE", "true")
        .WithEnvironment("AZURE_COSMOS_EMULATOR_IP_ADDRESS_OVERRIDE", "127.0.0.1")
        .WithPortBinding(10250)
        .WithPortBinding(10251)
        .WithPortBinding(10252)
        .WithPortBinding(10253)
        .WithPortBinding(10254)
        .WithPortBinding(10255)
        .Build();

    [OneTimeSetUp]
    public Task OneTimeSetup()
    {
        return Task.WhenAll([
            _cosmosDBContainer.StartAsync()
        ]);
    }

    [OneTimeTearDown]
    public Task OneTimeTearDown()
    {
        return Task.WhenAll([
            _cosmosDBContainer.DisposeAsync().AsTask()
        ]);
    }

    [Test]
    public async Task TestContainerIsCallable()
    {
        Thread.Sleep(180000);

        var connectionString = $"{_cosmosDBContainer.GetConnectionString()};DisableServerCertificateValidation=true";

        using var client = new CosmosClientBuilder(connectionString).WithLimitToEndpoint(true).Build();
        var result = await client!.CreateDatabaseIfNotExistsAsync("database-1");
    }
aahmed-dfe commented 6 months ago

Rather than Task.Sleep this may help you do the polling;

   new CosmosDbBuilder()
      .WithWaitStrategy(Wait.ForUnixContainer().UntilPortIsAvailable(CosmosDbBuilder.CosmosDbPort))

The CosmosDb container builder also has some built in polling for the logs from STDOUT

image

Chris-Greaves commented 6 months ago

The CosmosDb container builder also has some built in polling for the logs from STDOUT

This message in STDOUT doesn't accurately reflect the status of the API or UI. This is super misleading, and essentially can't be trusted. Instead it's suggested that you wait for the certificate endpoint to start responding, and then you know definitively that the emulator has started.

I haven't tested the UntilPortIsAvailable, but if it works for you that's great!

To add a custom WaitUntil for the cosmosDB container, see https://github.com/testcontainers/testcontainers-dotnet/pull/1109/files

kirankumarkolli commented 5 months ago

Thanks @Chris-Greaves.

We are working with emulator on ways of waiting for readiness possibilities. Will circle back on this thread with plan forward