DeepLcom / deepl-dotnet

Official .NET library for the DeepL language translation API.
MIT License
180 stars 25 forks source link

Issues with the HttpClient wrapped inside DeepLClient #29

Open BananaSupreme opened 1 year ago

BananaSupreme commented 1 year ago

Describe the bug Hey there! I wanted to let you know about a potential issue with how our application is set up.

Basically, the way HttpClient is being used by default can be a bit confusing for consumers, and the way it's documented isn't super clear. Specifically, if you're creating many ITranslator objects and disposing them you can create socket exhaustion issues in high throughput systems. This is because a new HttpClient is created with each ITranslator object.

To make things worse, there's another issue with long-running systems that use a singleton HttpClient, that requires pooling and timing the internal message handlers used by the HttpCleint. This is because the HttpHandler doesn't really respect DNS TTL, which can be a problem. So, it's generally recommended to use either the client with the PooledConnectionLifetime property or use IHttpClientFactory to help avoid these issues.

Even if you try to use a factory-created client in the options, you might still run into problems with polly policies being overridden, so a consumer would require setting up new Polly policies which may not be intended, and also not documented, which can be a pain to deal with.

Anyway, I just wanted to bring this to your attention so it can hopefully be sorted out.

DeeJayTC commented 1 year ago

Hey @BananaSupreme thanks for that, we had some other issues regarding our HTTPClient implementation earlier related to certificate problems and TLS.

We are thinking about changing that, exposing it so users can implement their own clients, etc. I'd be curious if you have any suggestions tho.

I know your scenario's there quite well and personally, I'd try to reuse the client as much as possible to avoid that, is that what you have in mind as well? Any other suggestions?

BananaSupreme commented 1 year ago

Hi :),

Less than a concrete plan I have some ideas that can maybe start a conversation going, I'd like to experiment how 100% everything I'm saying is possible as well, I think I can do that on Sunday :)

I think this problem can split a bit, because if the client gets injected into the DeepLClient, than in the world where DI is a thing (which I imagine is a common scenario for .NET users, but is worth checking. Also this is where I spend most of my professional life) Then an internal constructor to be used by DI can be made that inject the DeepLClient as a typed client created by the IHttpClientFactory, and the builder the factory exposes be exposed to the user so they can extend it as they see fit, could be a path.

If it's a secondary internal constructor for DI than it's not a breaking change, either.

For a non-DI world, I have some idea about maybe first steps but, also I'm not familiar with TLS issues with HttpClient, I never saw such problems (But again I use HttpClientFactory, and any mTLS concerns I've had I solved with nginx so maybe I just never had to take care of it) do you mind expanding?

I would probably firstly switch the HttpClientHandler used as the innerHandler to SockesHttpHandler which is the recommended version from .Net Core 2.1 and later and add the PooledConnectionLifetime property. According to the documentation this should solve the main pitfalls of the HttpClient.

In regards to your question about reusing the client, yes I completely agree, and the Translator recommended use as a singleton because it wraps a HttpClient should be documented.

It doesn't solve the creating your own client overrides resiliency, I'll give it some more thought, maybe just expose the client as an Action might put the user in weird situtations.

DeeJayTC commented 1 year ago

I was actually experimenting with an ASP.NET extension for the SDK to support the DI nature of controllers etc which might at least help with some of these issues, so in any ASP WebAPI or similar you'd just set it up with like:

services.AddDeepL(ApiKey)  // singleton default
or
services.AddDeepL(ApiKey, options => { mode = transient/scoped/whatever, httpClient = <myCustomClient>} )

Switching the HttpClientHandler is however something we really should do, agreed.

BananaSupreme commented 1 year ago

Yeah, I think so as well! And then if there is DI you can just have IHttpClientFactory create the client for you So: services.AddDeepL(ApiKey, options => { mode = transient/scoped/whatever }, clientBuilder => <This would be the IHttpClientBuilder exposed when configuring the IHttpClientFactory> )

Is there a reason why someone might want to use the client as transient for example? As this can lead to socket exhaustion, maybe send a wrong message? What do you think?

BananaSupreme commented 1 year ago

So I ran some tests, turns out the Client is registered as transient by default by the client factory and than the handler is just not disposed, so I added the ITranslator in the DI as transient as well, seems to work and no bloating of connections listed

One thing that came up during this POC is that Sockets handler doesnt exist in standard 2.0, so I split the logic. I'm not sure there's any good way to solve those problems than at standard 2.0

https://github.com/BananaSupreme/deepl-dotnet/pull/1/files

DeeJayTC commented 1 year ago

hey @BananaSupreme you can actually already add your own library by adding a custom function that returns HttpClientAndDisposeFlag to the TranslatorOptions.

Something like this:

 public static HttpClientAndDisposeFlag GetCustomHttpClient() {
    return new HttpClientAndDisposeFlag {
      HttpClient = new HttpClient(),
      DisposeClient = true
    };
  }

In regards to DI check this: https://github.com/DeepLcom/deepl-dotnet/blob/75fd2c6c062ef2c9ab145d953d7ede79f419fd07/src/DeepL.Service/DeepLService.cs#L42

https://github.com/DeepLcom/deepl-dotnet/blob/75fd2c6c062ef2c9ab145d953d7ede79f419fd07/samples/ASP.NET/Program.cs#L19

BananaSupreme commented 1 year ago

Hi @DeeJayTC,

IMHO, if I was a consumer of the system I would find it confusing to have to handle the client myself unless it was explicitly required. And wouldn't expect the implementation in the library to have faults, that require me to bypass it.

Besides if it would be registered as scoped the constant rebuilding of the HttpClient would exhaust the sockets

I understand there are many reasons to not prioritise certain issues, I think it might be worth giving it a fix, but I don't have the inside look into the team and the workload :)

The DI implementation looks really clean, a much better idea than mine, haha

DeeJayTC commented 1 year ago

Well you can just use it as a singleton i guess, no need for a scoped ir transient usage or do i miss something here?

BananaSupreme commented 1 year ago

The old HttpCilentHandler has a DNS reaolution issue in which it does not respect the TTL on DNS requests, so the singleton version becomes outdated at some point