bigcommerce / gruf

gRPC Ruby Framework
https://github.com/bigcommerce/gruf
MIT License
626 stars 73 forks source link

Did release 2.17.0 break call concurrency? #206

Closed Abdullah-l closed 3 months ago

Abdullah-l commented 8 months ago

Please describe the issue

Hey guys @splittingred @leonid-shevtsov, check this out: Create two RPCs

  1. Client streaming
  2. Anything else (I tried server streaming and unary)

Place a log in both RPCs Call RPC 1 then 2

On gruf 2.16.2 it behaves normally by printing both logs

On gruf 2.17.0 it does not print the log in RPC 2. The server does not acknowledge the call at all, the client keeps loading indefinitely unless I cancel RPC 1.

This blocking behaviour I suspect was introduced in #178 as it's the only PR in that release that touches threading.

The issue is reproducible on the latest version too (2.19.0)

Anything else we should know?

splittingred commented 8 months ago

Hi @Abdullah-l - I'm not able to replicate this; can you provide a repository/code sample of this occurring?

splittingred commented 8 months ago

Further illustrated no apparent issue here: https://github.com/bigcommerce/gruf/pull/207

Abdullah-l commented 8 months ago

Hey @splittingred, I managed to reproduce it in your rake task :)

I created two branches with the exact same changes

  1. https://github.com/Abdullah-l/gruf/tree/after-2.17
  2. https://github.com/Abdullah-l/gruf/tree/before-2.17

The issue is easily seen in the logs Before 2.17.0

I, [2024-02-23T02:24:05.015640 #20703]  INFO -- : Received get1 at 2024-02-23 02:24:05 +0300
I, [2024-02-23T02:24:06.221235 #20703]  INFO -- : Received get2 at 2024-02-23 02:24:06 +0300
I, [2024-02-23T02:24:06.325923 #20703]  INFO -- : Received get3 at 2024-02-23 02:24:06 +0300

After 2.17.0

I, [2024-02-23T02:25:00.553379 #21885]  INFO -- : Received get1 at 2024-02-23 02:25:00 +0300
I, [2024-02-23T02:25:30.554698 #21885]  INFO -- : Received get2 at 2024-02-23 02:25:30 +0300
I, [2024-02-23T02:25:30.554768 #21885]  INFO -- : Received get3 at 2024-02-23 02:25:30 +0300

As you can see after 2.17.0 get2 and get3 requests are blocked for a while there. They arrived 30 seconds later (after get1 timed out).

splittingred commented 8 months ago

@Abdullah-l can you provide more information about your working environment:

Abdullah-l commented 8 months ago

Ruby version - I tried 2.7, 3.0, 3.1 Rubygems version - 3.3.3 gRPC version installed - 1.61

I'm like 99.99% sure that 2.17.0 broke it

I hope you can reproduce it :)

You can just checkout the two branches I linked in the previous message and run:

./spec/demo_server
bundle exec rake gruf:demo:concurrent

on each one and see the logs

leonid-shevtsov commented 8 months ago

And to clarify, is this under development mode or production mode? (What does Gruf.development? say?)

Basically the new code uses a read-write lock; any number of requests can concurrently "read", that is, access the controller - but when the controller is reloaded the lock is "write" locked and no concurrency is allowed.

I'd add some logging statements to see the order of operations around reload, the write lock acquire/release and the read lock acquire release.

Abdullah-l commented 8 months ago

Using the rake task @splittingred wrote, Gruf.development? is true

However the issue still occurs when I used it with rails on a staging environment.

@leonid-shevtsov regarding adding logs, I didn't fully understand where to add them and what are we looking for 😅 is it here?

leonid-shevtsov commented 8 months ago

is it here?

yeah, here and also in lines 54-56; before the block (locking), at the beginning of the block (locked), and after it (unlocked); normally, you should not see a very long write locked period (that would be breaking concurrency indeed)

hmm... perhaps what's happening is the first request locks it for reading, and then the second wants to lock it for writing and hence has to wait until the first one completes? but this should only happen in a development environment.

could it be possible that you're running it in dev mode on staging? dev mode enables code reloading and possibly other things that you generally don't need or want outside of a dev environment.

and looks like the fallback environment is development:

https://github.com/bigcommerce/gruf/blob/2a583e57c3cce1a960657c2337aa9a93656a6455/lib/gruf/configuration.rb#L205-L222

Abdullah-l commented 3 months ago

Closing because I gave up on this 😅