HubSpot / slack-client

An asynchronous HTTP client for Slack's web API
Apache License 2.0
115 stars 53 forks source link

Expected usage / lifecycle? #170

Closed lightbody closed 7 months ago

lightbody commented 4 years ago

We recently discovered that our use of this library was leading to eventual resource starvation ("too many files"). In short, each call to SlackClientFactory.create() results in 5 additional epoll/kqueue entries in lsof. This makes sense of course, since any API client needs a pool of connections to make service calls.

The issue is that the ergonomics of the API lead you down the path of creating a SlackClient per user / access token. In some cases you'll have a single bot access token shared across all your users, but in other cases you'll have an access token per user. In the latter, you can end up with an ever-growing set of separate NingAsyncHttpClient instances. While they all share the same NioHttpClient.Factory, they appear to each create their own pool of resources.

So without calling SlackClient.close(), which isn't mentioned in the readme, you can easily end up with resource starvation.

I'd be happy to help modify the docs/readme to be clearer, but my first question is: what is the correct way to instantiate a SlackClient? Should multi-tenant services keep caches of them per user? Should we create and close one for every request that we handle? Although I've submitted a bunch of patches, I haven't looked closely at this part of the code before now. I'll keep investigating, but any feedback on proper usage lifecycle would be really helpful.

Thanks.

lightbody commented 4 years ago

Oops, I should add: even bot access tokens will require unbounded calls to the factory create(), if you have many users across many Slack workspaces. It's just made even worse if you require user access tokens.

szabowexler commented 4 years ago

Interesting. So full disclosure: it sounds like you're using the client to do something which I haven't had personal experience with (which is probably part of why the ergonomics may be awkward). The client itself inherits from Closeable (and by extension AutoCloseable), so the idea is that clients should stick around only as long as they are necessary. I think the implicit idea behind the current design is that if you have long lived clients, you will have a relatively small number of them, and if you have short lived clients, then it's not an issue if you're opening/closing them frequently.

I don't think we have any internal use cases where we have many long lived clients atm (and because we use Guice extensively internally, auto-closeables are automatically closed by our cleanup which I'm guessing guards us from that leak).

Naively, it sounds like what might be missing in the API is some sort of configuration hint that allows you to suggest you want to use a single set of resources across multiple clients. Though while that might protect you from starvation, you might also run into nasty tuning issues and lack of fairness (i.e. local starvation in a specific client).

lightbody commented 4 years ago

I don't actually need long-lived clients, I just didn't realize until now I had to close them / encase them in a try-with-resources block 🤣

I think the simplest thing for now appears to be to instantiate a SlackClient just-in-time, use it, and then discard it. That will be the same behavior I have today, except without the leak.

My only concern is if I'm even doing this the way it was intended. For example, am I getting the benefits of any sort of HTTP connection pool between client invocations? Does the re-use of NioHttpClient.Factory (via NioHttpClientFactory.defaultFactory()) ensure that?

lightbody commented 4 years ago

Oh, I should add: I tried wrapping my code in a try/finally that closed the client, and I immediately had errors that seemed to indicate I was closing the client before the CompletableFuture callbacks had a chance to complete, causing a bunch of exceptions. So I'm still at a loss about how I'm really supposed to use thing thing.

Maybe a non-Guice example to get me going?

PS: My app uses Microaut, which has it's own HTTP client that some day I'd love to see as a pluggable option to the Ning HTTP client currently used.

lightbody commented 4 years ago

Sorry, one more thing just to be a bit more specific on my request for examples. I'm aware these exist:

https://github.com/HubSpot/slack-client/tree/master/slack-java-client-examples/src/main/java/com/hubspot/slack/client/examples

And I used them for how I wrote my code. But I'm not realizing that if you looped these examples 1000 times over in the same JVM, they'd produce the same resource starvation I'm facing. I think :)

szabowexler commented 4 years ago

I think you're right! We definitely didn't load test those 😰

So skimming the code, it looks to me like the problem is in how we treated NingAsyncHttpClient. The factory that wraps the client doesn't provide you any reuse, so the real issue is that we didn't expose a way for you to construct a common http client and then reuse it among iterations of the client that differ solely on the token axis.

It seems to me that if we update SlackClientRuntimeConfigIF to have an optional NioHttpClient, then you could construct that for yourself wherever you reuse it, and set it on the client so that we don't continuously create new NingAsyncHttpClient since that looks like the leak in this case

szabowexler commented 4 years ago

I wrote up a quick PR that allows you to reuse the underlying http client. I do think it's tricky here, since you can't close the client down until your requests have finished (i.e. if you do try-with-resources, you must get/join your calls before the end of the block).

lightbody commented 4 years ago

Thanks. And in the meantime, how do you recommend we close the client properly if we didn't reuse anything? I'm fine with that as a short term solution, but when I drop things into a try-with-resources I get errors that the connection was closed unexpectedly, which kinda makes sense due to the async nature of the code.

szabowexler commented 4 years ago

Right. Basically you will need to ensure that you call close on the client only after you've completed whatever operations you needed it for. So either you need to ensure all your operations have a join/get on the CF within the try block, or you need to track the client in the context of the method that you're doing the work and manually call close after. Does that make sense?

lightbody commented 4 years ago

Yes, sorry I see you explained that in your previous message. Unfortunately, due to the way our code is structured that isn't easily doable right now. So I think I'll use your PR and reuse the HTTP client for now.

szabowexler commented 4 years ago

That should work! I've merged it in, but it may be a bit before the next release (not more than 2w) since we just cut one

lightbody commented 4 years ago

All good, I'm still on my own fork anyway (see PR #166) so I'll cut a release for myself now.

szabowexler commented 4 years ago

I went ahead and merged that fellow, no sense letting it diverge too much

lightbody commented 4 years ago

Just a quick note: we're live with the update, reusing the same HTTP client and it's working great. Thank you!

jaredstehler commented 7 months ago

closing this as completed; please re-open if desired.