Azure / azure-kusto-java

Microsoft Azure Kusto Library for Java
MIT License
38 stars 43 forks source link

Asynchronous HTTP Client #249

Open The-Funk opened 2 years ago

The-Funk commented 2 years ago

Describe the solution you'd like Utilize an asynchronous HTTP client instead of Apache's CloseableHttpClient in order to provide a nonblocking transport of data from the ADX database to my Java application.

Describe alternatives you've considered NA

Additional context The current implementation of the ADX library uses the Apache HTTP Client. This client hasn't received an update in 2 years. A good asynchronous alternative might be to use the vert.x web client. https://vertx.io/docs/vertx-web-client/java/ https://mvnrepository.com/artifact/io.vertx/vertx-web-client

The-Funk commented 2 years ago

Side note, you might want to offer multiple "flavors" for http client since several asynchronous clients exist that are rather opinionated and provide their own "future" or async types. vert,x is one of these.

The-Funk commented 2 years ago

I submitted a PR to bump the Apache HTTP Client to version 5 which has async support. For the most part the code from version 4 could be reused. Very few breaking changes. Might be worth looking into.

I did also bump the minimum Java version to 11 since Java 8 premier support ended earlier this year.

The-Funk commented 2 years ago

Any update on this? I am running into issues where trying to query data fast enough is not possible due to ADX' 64MB query limit and the fact that there is still no async support.

The-Funk commented 2 years ago

Just checking in again to see if anything is being done on this. CompletableFuture support from Apache HTTPClient 5 would be enough to resolve most of my issues I think.

AsafMah commented 1 year ago

Sadly this feature will be postponed for now. When we will add it, we will probably move to new standard http client in java 11.

The-Funk commented 1 year ago

I noticed the Java 11 implementation has gone through, would you like help refactoring to use the Java 11 client?

The-Funk commented 1 year ago

@AsafMah do you know if this is still in the pipeline along with the native Java client?

I'm also noticing that this library using msal4j, are there plans to switch to azure-identity at some point?

yogilad commented 1 year ago

Regarding use of MSAL, we do not plan on rewriting all the auth code using Azure.Identity, however we should provide a WithAzureTokenCredential() API for customers to work with.

@AsafMah , consider adding this to Ron's task list so this gets done sooner then later.

The-Funk commented 1 year ago

@yogilad do you know if async operation is still in the pipeline for the library? I would like to reduce the number of threads that must be allocated to kusto-java without impacting performance.

yogilad commented 1 year ago

This work not is planned anytime soon. The main reason is, it requires rearchitecting the SDK to avoid code duplications which comes at a high cost.

The-Funk commented 11 months ago

@yogilad @AsafMah @ohadbitt I have extra availability in the coming weeks, would you like me to take a shot at upgrading the project to use httpclient5 again, or potentially switching to the Java 11 built-in HTTP Client?

Impact of moving to the Java 11 client:

Note: The last time I tried this, I moved to httpclient5. For the most part the existing code could be kept/reused however client 5 does get rid of connection pooling features, instead using async io under the hood to handle concurrent connections. This is something that might need tested.

If you have any interest, let me know.

ohadbitt commented 11 months ago

Hi @The-Funk Cole, We decided that any future change in our httpClient should move to align with other Azure SDKs httpClient usage which is using projectReactor and the underlying reactor-netty client.

API shouldn't change much besides adding an Async interface as the Mono returned by storage async client can be translated to CompleteableFuture

We need to make sure this would work with the factory methods where user provides the client himself - so that we can support proxy configurations in both msal authentication and Blob storage operations.

ohadbitt commented 11 months ago

@The-Funk , are the thumbs up meaning you want to implement it our way or wait for us to complete this task some time in the future?

The-Funk commented 11 months ago

@ohadbitt I can give it the old college try!

I am wondering how much effort it will be.

The use of entity is what concerns me most. The concept of an HTTP entity is basically all but gone in every modern HTTP client. I think Apache httpclient4 was the last one holding onto it.

I am thinking of easy ways to work around its usage in the existing codebase. An entity is just a few specific headers and the message body....

I am also wondering about how the HTTP client is currently passed around in the code.

If you move to using async under the hood like the Netty client does, you won't need client pools, so you could just use a single HTTP client and hold open connections. I think that would be a lot more efficient. The question is how to implement it.

I would guess you wouldn't want to lock the users into only having one client, so a singleton is out. What if a user wants to connect to two disparate ADX databases? But I am thinking that you could store the inner HTTP client as an instance property of the KustoClient abstraction layer. Then you could create as many (or as few) Kusto Clients as you want, to connect to as many (or as few) ADX clusters as you want, and each KustoClient only maintains 1 HTTP client.

Does that sound like a decent gameplan to y'all?

ohadbitt commented 11 months ago

Yea this sounds good - we don't do client pools - you can see the httpclient is already a member of the KustoClient - so no work here You might want to think about starting with a small PR just to moving using the netty client with current impl - see how it works. Second big PR will be to make async API

The-Funk commented 10 months ago

@AsafMah @ohadbitt @yogilad

With the holidays approaching I've taken a significant amount of PTO to work on projects I find interesting and that are useful to me. If you would like, I can begin work on a second PR to address adding async options with the new HTTP client.

I was thinking something along the lines of just mirroring the sync APIs in a new interface. I would use the same names like execute and executeToJson but the return types in the new interface would be CompletableFutures or Monos depending on your preference.

From there I could create a new client implementation of said interface and add it to the factory class.

I'll try to make use of when() and proper chaining with the Netty async API.

ohadbitt commented 10 months ago

Hey @The-Funk As i said today in the PR - main issue here would be to test it well would you like to take this effort as well ? If so its better to do it first as its relevant to both PRs We can discuss it here or on the PR

The-Funk commented 7 months ago

@ohadbitt @AsafMah

I've given this some thought. There's a few directions we could go for the async client.

As is, there are 12 methods in the Client interface for the sync client. However in the existing client implementation all of these methods chain to just 2 implementations. One for execute() and one for executeToJsonResult(). The rest are just helper methods that chain to one of these 2.

A few directions we could go:

That PR from yesterday introduces a KustoQuery object and if it is something that interests y'all, we could use that PR to deprecate the method chaining from V5 and below (without deleting them at this time). If we went the route of introducing the wrapper object first, we could use the existing Client interface and Client implementation and add just 2 additional methods that would look like the following:

   public Mono<KustoOperationResult> execute(KustoQuery kq) {
      // implementation
   }

   public Mono<String> executeToJsonResult(KustoQuery kq) {
      // implementation
   }

Let me know what y'all think. I'm open to any of these approaches.

Disclaimer: Because the azure-core HTTP Client's default async return type is Flux/Mono, I figure it would be best to just stick with these types for the async results. These are reactive types, which gives them significantly more utility by default than Java Futures/CompleteableFutures and there are adapters for these reactor types for just about every other reactive library. I myself will probably convert them into Mutiny-API equivalents in my projects.

AsafMah commented 7 months ago

For your question, we want to decrease the number of overloads in general:

We can use KustoQuery internally if it helps.

So the amount is much more manageable.

As for the API methods - let's have them in the same object as new methods (that end with async).

At the end, we want all of our internals to be async, and the sync methods will be simple wrappers that block.

As for the order of work, I think we should start with data, then ingest, in a bottom-up manner:

We can talk over it on a call if you're available.

The-Funk commented 6 months ago

@AsafMah I should have availability for a call sometime this week. What is the best way to get in contact with y'all? Teams? Discord?

Regarding the order of operations, I think the above sounds good.

The-Funk commented 6 months ago

@AsafMah I have started working on this locally. Will try to push some changes to my existing KustoQuery branch to show some PoC work.

The-Funk commented 6 months ago

Update: I am still working in my branch. Haven't had a chance to push anything yet but I am hoping to push code this weekend to show progress.

AsafMah commented 6 months ago

Hello, we have been on a holiday for the last two weeks so we haven't been able to keep in touch.

Let's arrange a team meeting, how can we reach you?

The-Funk commented 6 months ago

If you email cole.snyder@gmork.tech that should work. I can also accept Teams/Zoom invitations there.