absinthe-graphql / absinthe

The GraphQL toolkit for Elixir
http://absinthe-graphql.org
Other
4.29k stars 527 forks source link

Add async option to Absinthe.Subscription #1329

Closed bryanjos closed 3 months ago

bryanjos commented 3 months ago

We have an app that is a heavy user of subscriptions. We noticed that one of our servers were failing health checks, causing it to restart. Eventually we found that we were running into an issue of unbounded concurrency that led us to the proxy here in absinthe. We resolved it by simply removing the task supervisor. This creates a bottleneck here that limits concurrency. It doesn't completely limit concurrency though as the user can still control the level of concurrency via the pool option.

This PR adds the option to allow Proxy to handle events from pubsub async or not.

benwilson512 commented 3 months ago

hey @bryanjos notably the trade off here is that these processes are also now more likely to have their message queue grow if the pool can't keep up.

I get the change here, but I think we also need to probably look at some overflow protection ideas here too.

benwilson512 commented 3 months ago

The other thing to note here is that the theory was that if your user base is evenly distributed between the nodes, the activity coming in via these processes would be no worse than the activity driven by user engagement, so as long as that wasn't OOMing you should be good. If that isn't holding then can you elaborate a bit on what is driving the subscription publication?

bryanjos commented 3 months ago

@benwilson512 I recognize the tradeoff and even brought it up when this was suggested by a teammate. But somehow it's behaving fine now.

It is supposed to be temporary though. Ideally we'd want something with some back pressure there and controlled concurrency. For instance, I proposed a change in our fork to add gen_stage and created a producer that listened for pubsub changes that were picked up by a ConsumerSupervisor where the max concurrency was equal to max_demand.

So maybe this solution wouldn't work here, and maybe one with some added back pressure would? Or at least the ability to customize that part of things. Some of that is already possible. Changing whatever module implements Absinthe.Subscription.Pubsub.publish_mutation to instead broadcast the message to a different topic that the proxy is listening to, to some other processes with back pressure can get you there.

What is driving our subscription publication is changes to uploaded files from our media pipeline. The users listening for subscriptions are evenly distributed across nodes. We never ran out of memory but there was so many processes that our health check was taking too long to get a response from absinthe saying it was alive. Maybe #1330 solves that though as it was causing a lot of waiting.

bryanjos commented 3 months ago

@benwilson512 any more thoughts on this?

I'm thinking if this isn't acceptable, is there any existing mechanism for customizing that part of proxy? Or anything we could introduce?

Ultimately I'm thinking some form of back pressure might be needed. And if that's the direction we should go, I can close this and propose something around that.

benwilson512 commented 3 months ago

@bryanjos can you make this an option perhaps? This way the default behavior can be async but if you want you can do {Absinthe.Subscription, async: false} and then it does this sync based logic?

bryanjos commented 3 months ago

@benwilson512 yes I can do that! Will update it soon

bryanjos commented 3 months ago

@benwilson512 I made the changes. Let me know what you think. I also updated the changelog with the unreleased PRs merged

bryanjos commented 3 months ago

Awesome! Thanks!