Azure / azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
MIT License
2.35k stars 1.99k forks source link

[BUG] Netty HTTP Server exception for There is a blocking call in the CosmosClientAsync #40051

Open alejojarahi opened 6 months ago

alejojarahi commented 6 months ago

Describe the bug They do a dynamic lookup on the warmup function in the the Netty HttpClient and then invokes this. They do this by blocking.

In the cosmos library this warmup call does not seem to be configurable.

Exception or Stack Trace

java.lang.RuntimeException: java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread default-nioEventLoopGroup-1-3
    at com.azure.cosmos.implementation.http.ReactorNettyClient.attemptToWarmupHttpClient(ReactorNettyClient.java:116)
    at com.azure.cosmos.implementation.http.ReactorNettyClient.createWithConnectionProvider(ReactorNettyClient.java:96)
    at com.azure.cosmos.implementation.http.HttpClient.createFixed(HttpClient.java:62)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.httpClient(RxDocumentClientImpl.java:744)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:508)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:335)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:295)
    at com.azure.cosmos.implementation.AsyncDocumentClient$Builder.build(AsyncDocumentClient.java:275)
    at com.azure.cosmos.CosmosAsyncClient.<init>(CosmosAsyncClient.java:170)
    at com.azure.cosmos.CosmosClientBuilder.buildAsyncClient(CosmosClientBuilder.java:1084)
    at com.azure.cosmos.CosmosClientBuilder.buildAsyncClient(CosmosClientBuilder.java:1071)
    at io.micronaut.azure.cosmos.client.CosmosClientFactory.buildCosmosAsyncClient(CosmosClientFactory.java:59)

To Reproduce

  1. Add implementation 'com.azure:azure-cosmos:4.57.0' in Java Gradle
  2. Use Netty HTTP Server
  3. Use Reactor
  4. In the Framework (Micronaut 4 or Springboot 3) with Netty HTTP Server, create connection in request context. For example create connection with Singleton Pattern.
  5. Request to Repository.

Code Snippet

import com.azure.cosmos.CosmosAsyncDatabase;
import com.azure.cosmos.CosmosClientBuilder;
import jakarta.inject.Singleton;

@Singleton
public class UserRepository {

    private static final String DATABASE_NAME = "test";

    CosmosAsyncDatabase cosmosAsyncDatabase = null;

    public CosmosAsyncDatabase getInstance() {
        if(cosmosAsyncDatabase == null) {
            cosmosAsyncDatabase = new CosmosClientBuilder()
                    .endpoint("String Connection")
                    .key("Key Connection")
                    .buildAsyncClient()
                    .getDatabase(DATABASE_NAME);
        }
        return cosmosAsyncDatabase;
    }

    public Flux<User> getAll() {
        return getInstance().getContainer("users")
                            .queryItems("SELECT * FROM users", User.class);
    }
}

In Springboot or Micronaut Controller

private final ClienteRepository clienteRepository = new ClienteRepository();

@GetMapping("/all")
public Flux<UsuarioDto> getAll() {
    return clienteRepository.getAll();
}

In Micronaut Framework there is a setting that also leads to the same problem https://github.com/micronaut-projects/micronaut-azure/blob/5.5.x/azure-cosmos/src/main/java/io/micronaut/azure/cosmos/client/CosmosClientFactory.java

Expected behavior The connection to the database is created and the query can be executed without any exception or thread blocking

Setup (please complete the following information):

Additional context On stackoverflow they expand a little more on the same error that is presented in actual version https://stackoverflow.com/questions/73412719/create-recreate-azure-cosmosdb-async-client-from-java-reactor-context

Information Checklist Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

github-actions[bot] commented 6 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @kushagraThapar @pjohari-ms @TheovanKraay.

kushagraThapar commented 6 months ago

thanks @alejojarahi for creating this issue, looks like a genuine problem in case of async client creation. @alzimmermsft I am curious, if this is the same case in azure-core as well and have we solved it? I would like to keep the solutions consistent if possible. Referencing the PR where we added this, will be good to re-iterate our approach to warming up http clients in case of async clients.

kushagraThapar commented 6 months ago

IMO, we should remove the blocking warmup of the http client. cc: @xinlian12 / @FabianMeiswinkel

alzimmermsft commented 6 months ago

thanks @alejojarahi for creating this issue, looks like a genuine problem in case of async client creation. @alzimmermsft I am curious, if this is the same case in azure-core as well and have we solved it? I would like to keep the solutions consistent if possible. Referencing the PR where we added this, will be good to re-iterate our approach to warming up http clients in case of async clients.

azure-core-http-netty doesn't perform a warmup of the Netty HttpClient

kushagraThapar commented 6 months ago

thanks @alzimmermsft for the confirmation. @alejojarahi - actually the issue here is not the warmup, rather the way you are creating the cosmos client. The code snippet you pasted in the issue description is creating multiple cosmos clients (per database) which is not recommended. We strongly recommend to only create a singleton client.

azure-cosmos client or any other client if needs to warmup, they will need to call certain APIs that will allow the client to warmup. Ideally, you would want to have a warmed-up client and then only enable the traffic on it (through nio threads or whatever). You would want to warmup the client on the main thread before opening it up for traffic.

If your ask is to make this configurable, we can think about doing so.

alejojarahi commented 6 months ago

Hi @kushagraThapar. Thanks for the reply.

This error also occurs in the scenario that CosmosAsyncClient is instantiated only onces. Example in Springboot 3

import com.azure.cosmos.CosmosAsyncClient;
import com.azure.cosmos.CosmosClientBuilder;
import reactor.core.publisher.Flux;

public class UserRepository {
    private static final String DATABASE_NAME = "test";

    CosmosAsyncClient cosmosAsyncClient = null;

    public CosmosAsyncClient getInstance() {
        if(cosmosAsyncClient == null) {
            cosmosAsyncClient = new CosmosClientBuilder()
                    .endpoint("ENDPOINT")
                    .key("KEY")
                    .buildAsyncClient();
        }
        return cosmosAsyncClient;
    }

    public Flux<User> getAll() {
        return getInstance().getDatabase(DATABASE_NAME)
                            .getContainer("users")
                            .queryItems("SELECT * FROM users", User.class);
    }
}
17:44:54.516 Data 06-may.-2024 | ERROR | Thread [reactor-http-nio-3] | c.a.c.i.RxDocumentClientImpl | unexpected failure in initializing client.
java.lang.RuntimeException: java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-3
    at com.azure.cosmos.implementation.http.ReactorNettyClient.attemptToWarmupHttpClient(ReactorNettyClient.java:116)
    at com.azure.cosmos.implementation.http.ReactorNettyClient.createWithConnectionProvider(ReactorNettyClient.java:96)
    at com.azure.cosmos.implementation.http.HttpClient.createFixed(HttpClient.java:62)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.httpClient(RxDocumentClientImpl.java:744)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:508)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:335)
    at com.azure.cosmos.implementation.RxDocumentClientImpl.<init>(RxDocumentClientImpl.java:295)
    at com.azure.cosmos.implementation.AsyncDocumentClient$Builder.build(AsyncDocumentClient.java:275)
    at com.azure.cosmos.CosmosAsyncClient.<init>(CosmosAsyncClient.java:170)
    at com.azure.cosmos.CosmosClientBuilder.buildAsyncClient(CosmosClientBuilder.java:1084)
    at com.azure.cosmos.CosmosClientBuilder.buildAsyncClient(CosmosClientBuilder.java:1071)
    at co.com.springexample.entrypoint.UserRepository.getInstance(UserRepository.java:19)

If the reactive client is subject to only being able to start by on the main thread, this can limit or affect the architecture of a project. For example this use case https://stackoverflow.com/questions/73412719/create-recreate-azure-cosmosdb-async-client-from-java-reactor-context.

If the team does not consider it a bug, it should be a configurable functionality

kushagraThapar commented 6 months ago

@alejojarahi the stack overflow code snippet you mentioned is also not a valid use case of the cosmos client, because the customer is trying to create a new cosmos client every time they need to rotate their keys.

SecretAsyncClient secretAsyncClient = new SecretClientBuilder().buildAsyncClient()
...
Mono<CosmosAsyncClient> client = secretAsyncClient.getSecret(KEY_NAME).map(
  s -> s.getValue()
).map(
  key -> new CosmosClientBuilder()
                        .endpoint(HOST)
                        .key(key)
                        .buildAsyncClient()
);
return client.flatMap(
   ...
);

Instead they should use a key rotation pattern like some form of KeyCredential object which has an update() API to update the key on the fly.

Creating clients randomly and during the execution of the application is never the right approach, as client creation is considered a heavy operation, specially in case of Cosmos SDK where it needs to get Database account, metadata, fill the caches, create TCP / HTTP connections, etc. when the client boots up. Any application which is doing this on nio thread is basically blocking their I/O resources for unnecessary client creation.

horvathb-avaya commented 3 months ago

Hi @kushagraThapar , we are facing the same issue with Micronaut 4. We are creating the Cosmos DB via application config and annotation, so using the micronaut implementation without any change. Based on @alejojarahi 's example about creating one async client in the comment dated 7th of May, a direct async client creation wouldn't work neither.

My question would be if there are any plans to solve this, or should we start investigating/raising issues on Spring/Micronaut side?

Thanks for your help!

kushagraThapar commented 3 months ago

@horvathb-avaya - I still wonder what's the actual error stack trace in your case. The issue should only happen if you are calling a blocking call in reactor async API on I/O scheduler which is a concern in its own.

This can be solved simply by using your own scheduler (either bounded elastic or parallel) which is provided by Reactor.

If your clients are creating automatically through annotation or application config then they shouldn't be running on I/O threads because if so, it will create further problems in your application by hijacking I/O resources for tasks which are not related to I/O. Can you please provide more details on the error stack trace?