aws / aws-sdk-ruby

The official AWS SDK for Ruby.
https://aws.amazon.com/sdk-for-ruby/
Apache License 2.0
3.58k stars 1.23k forks source link

RuntimeError: can't add a new key into hash during iteration #1331

Closed wjordan closed 6 years ago

wjordan commented 8 years ago

On our production application we are receiving intermittent stack traces on S3 requests- the errors arrive in bursts of a couple hundred over an hour or two, then nothing for a few days, then another burst (6 over the last 30 days). I suspect this might be some sort of concurrency issue related to network-error retries somewhere deep in the seahorse stack, but could use some help figuring out what exactly is going on and how to fix.

Using aws-sdk-core 2.6.1.

Relevant part of the stack trace (net_http):

Full aws-sdk stack trace with all handlers:

[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:40 :in `block in initialize`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:89 :in `yield`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:89 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:87 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:87 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:116 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:68 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:42 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/xml/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:65 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_redirects.rb:15 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/retry_errors.rb:87 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_dualstack.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_accelerate.rb:42 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_md5s.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_bucket_dns.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/rest/handler.rb:7 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/user_agent.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/endpoint.rb:41 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_url_encoded_keys.rb:39 :in `manage_keys`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_url_encoded_keys.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/param_validator.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_dualstack.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_accelerate.rb:33 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/param_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/response_paging.rb:26 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/response_target.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/base.rb:207 :in `block (2 levels) in define_operation_methods`
[...Application code stacktrace...]

Reproducing the issue with debug logging would be challenging (since the issue only occurs intermittently in production), so I'm hoping to track this down by inspecting the codebase directly.

trevorrowe commented 7 years ago

I don't see an obvious care where we are adding keys to the hash during iteration. The @pool hash is protected by a mutex and is only enumerated or indexed behind that mutex. The stack trace is terminating inside the hash's block that yields for missing keys.

I can rewrite the code to avoid using the default block for assigning the list. This should be a simple change that I can ship with the next release so you can test the change.

trevorrowe commented 7 years ago

This change should go out with our next release. Once this is live, can you update and let me know if this resolves the issue you've been experiencing?

wjordan commented 7 years ago

Thanks for looking into this! I will update as soon as a new release is cut.

Unfortunately since the issue is very intermittent (most recently we saw ~2 weeks between occurrences on our production traffic- we logged 692 errors on 11/4, then 139 errors on 11/17), It will be difficult to let you know if it definitively resolves the issue. However I will be sure to update here if the error ever occurs again after updating.

awood45 commented 7 years ago

Going to go ahead and close this for now, but feel free to reopen if you see this again after upgrading to the latest version.

islemaster commented 7 years ago

Hi there! I'm a colleague of @wjordan. This issue has recurred for us using aws-sdk-core 2.9.14. Here's a recent stacktrace:

RuntimeError: can't add a new key into hash during iteration
101             # No error raised? Good, check the session into the pool.
102             @pool_mutex.synchronize do
103               @pool[endpoint] = [] unless @pool.key?(endpoint)
104               @pool[endpoint] << session
105             end
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:103 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:102 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:102 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:119 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:71 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:45 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/xml/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:65 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_redirects.rb:15 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/retry_errors.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_dualstack.rb:32 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_accelerate.rb:49 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_md5s.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_bucket_dns.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/rest/handler.rb:7 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/user_agent.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/endpoint.rb:41 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/param_validator.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_dualstack.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_accelerate.rb:34 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/idempotency_token.rb:18 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/param_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/response_target.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/base.rb:207 :in `block (2 levels) in define_operation_methods`
[...Application code stacktrace...]
awood45 commented 7 years ago

Let's reopen and take a look...

joshlory commented 7 years ago

Is there any additional information we can provide to help investigate the error? Thanks!

awood45 commented 7 years ago

It's just interesting because Trevor's original comment about the mutexes seems to still apply. If you have some sort of scenario that can reliably recreate this (if possible) that may help.

Additionally, how frequently are you seeing this issue currently?

jeremydstone commented 7 years ago

It occurs periodically in bursts. For example, we had this happen not at all for the 2 weeks leading up to June 13. Then it happened 252 times on June 13 and 119 times on June 14. Then not at all for a few days, then 511 times on June 19.

When this does happen, it appears to be in a concentrated burst at a rate of about 1-2 faults per minute. They have all been on the same front end server within a burst as far as I can tell, so something about state of the gem or the server, rather than some externality such as network. (This is running on an EC2 instance within AWS VPC, btw.)

Even with the intermittency, this happens enough that this is presently our top operational issue in terms of faults/week.

awood45 commented 6 years ago

This is an issue where I'd encourage an upgrade to V3 of the SDK. It is generally better with concurrency, and while I can't guarantee it fixes this issue, it should make it a lot easier to narrow down with the more detailed stack traces.

awood45 commented 6 years ago

Closing soon if this remains inactive, I'd love to know if V3 doesn't resolve this.

wjordan commented 6 years ago

We haven't upgraded to V3 yet, but I will follow up here when we do complete an upgrade. This is still affecting our production application- the last burst was 1,232 instances of this error on Apr 29, running version 2.10.79.

islemaster commented 6 years ago

Hi there! This issue just occurred again for us using aws-sdk-core 3.22.1, aws-sdk-firehose 1.4.0. We saw a burst of 600 instances of this error in a 24-hour window this weekend, September 15-16.

We're currently investigating available gem upgrades to see if they help:

Here's a recent backtrace:

[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:107 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:106 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:106 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:121 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:73 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:47 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/json/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/signature_v4.rb:66 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/retry_errors.rb:138 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/json/handler.rb:11 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/user_agent.rb:13 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/endpoint.rb:45 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/param_validator.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/idempotency_token.rb:17 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/param_converter.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/response_paging.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/response_target.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-firehose-1.4.0/lib/aws-sdk-firehose/client.rb:1040 :in `put_record`
[Application repo]/lib/cdo/firehose.rb:52 :in `put_record`

Firehose showing up in the backtrace is new, and we're investigating to see if there's something we're doing wrong in our application layer that might cause this, or any infrastructure-level events that might explain it. In the meantime we figured a backtrace from the v3 sdk might be helpful. Thank you!

awood45 commented 6 years ago

Let's reopen and take a look then. One other thing that may be relevant, can you tell us a bit more about how you're doing concurrency in your application? It may help us to attempt to reproduce your issue.

srchase commented 6 years ago

@islemaster

Do you have any additional details that would help us reproduce the issue?

srchase commented 6 years ago

Closing this issue. Happy to re-open if there are additional details to reproduce.

n-ikhil commented 2 years ago

We had the same issue, in aws-sdk-core (3.104.4) We used passenger as web server with smart spawning (which is default). We suspected this issue is due to copy-on-write used in forking process. You can refer this link on how passenger does this. You can try using direct spawning if you have similar setup. In our case we made changes in our code, cleared all connection pool in fork hook and these errors stopped completely

rottgoth commented 1 year ago

Hi, I just started seeing this issue with version aws-sdk-core (3.180.3) while ramping up concurrency with puma. This is the stack trace that we got:

at block in session_for(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:111)
at synchronize(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:110)
at session_for(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:110)
at session(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:128)
at transmit(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:76)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:50)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/content_length.rb:24)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/request_callback.rb:87)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/crc32_validation.rb:32)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/json/error_handler.rb:10)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/sign.rb:49)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/transfer_encoding.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:12)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/retry_errors.rb:360)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/user_agent.rb:37)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/http_checksum.rb:19)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/endpoint_pattern.rb:30)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/checksum_algorithm.rb:136)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/request_compression.rb:94)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/json/handler.rb:13)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/recursion_detection.rb:18)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/endpoints.rb:41)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/endpoint_discovery.rb:84)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/endpoint.rb:47)
at block in call(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/contrib/aws/instrumentation.rb:21)
at trace(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/tracer.rb:283)
at call(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/contrib/aws/instrumentation.rb:20)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/param_validator.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/logging.rb:41)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/raise_response_errors.rb:16)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/simple_attributes.rb:119)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/idempotency_token.rb:19)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/param_converter.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/request_callback.rb:71)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/response_paging.rb:12)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/response_target.rb:24)
at send_request(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/request.rb:72)
at describe_table(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/client.rb:3008)

Are you still seeing this issue? Are there any ideas on how to fix it?

yokonao commented 6 months ago

We had encountered the same issue.

As a result of our investigation, I succeeded in writing a small reproduction code.

require 'aws-sdk-s3'
require 'aws-sdk-dynamodb'

Aws::S3::Client.new.list_buckets
puts Seahorse::Client::NetHttp::ConnectionPool.pools.first.instance_variable_get(:@pool)
# {"https://s3.ap-northeast-1.amazonaws.com"=>[#<Net::HTTP s3.ap-northeast-1.amazonaws.com:443 open=true>]}

Thread.new do
  pool = Seahorse::Client::NetHttp::ConnectionPool.pools.first
  loop { pool.size } # iterate Seahorse::Client::NetHttp::ConnectionPool @pool forever
end

sleep 0.1 # wait a miniute for the thread to start

fork do
  Aws::DynamoDB::Client.new.list_tables
  # => can't add a new key into hash during iteration (RuntimeError)
  # error occurs when trying `@pool["https://dynamodb.ap-northeast-1.amazonaws.com"] = []`
end

This program raises RuntimeError: can't add a new key into hash during iteration!

This may be due to a race condition between the parent and child process when using fork(2) system call.

How to fix? Please add a line the begging of the forked process for clearing the seahorse connection pools. Like following:

fork do
  Aws.empty_connection_pools!

  ...
end
mullermp commented 6 months ago

Thanks for adding a reproduction for this. I understand this is some long standing bug. I was able to reproduce, but then fix, when I change the size method to not iterate (just make a call to @pool.values.flatten.size) However I assume this would still happen because there are other methods that iterate the pool with each_pair, such as empty! and _clean. My understanding is, since fork makes a copy of the mutex, synchronize does not help here. Fork may be copying the state during the iteration.

If it helps, I can push a fix to that size method, but I can't guarantee you will stop seeing this?