getsentry / sentry-ruby

Sentry SDK for Ruby
https://sentry.io/for/ruby
MIT License
933 stars 493 forks source link

SystemStackError when serializing breadcrumbs. Potentially causing serious issues in Sidekiq when uses in conjunction with background workers #2397

Open drj17 opened 1 month ago

drj17 commented 1 month ago

Issue Description

Discord thread

Continuing this report from a discord thread I created a few days ago. We have been investigating our Sidekiq workers becoming unresponsive and think we traced the issue to a combination of two things.

First, a SystemStackError when serializing the breadcrumbs - here's some relevant logging

17653784398360371212024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3/app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15: warning: Exception in finalizer #<Proc:0x00007fe503ce5a28 (lambda)>
17653784398360371222024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3/app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15:in `[]': SystemStackError
17653784398360371232024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/instance_variables.rb:15:in `instance_values'
17653784398360371242024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:63:in `as_json'
17653784398360371252024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:180:in `block in as_json'
17653784398360371262024-08-28 09:10:412024-08-28 09:10:4113994936442blaze-ai-rails34.225.253.144Local7Infoapp/almanacworker.3\tfrom /app/vendor/bundle/ruby/3.1.0/gems/activesupport-7.0.5/lib/active_support/core_ext/object/json.rb:179:in `each'

I believe https://github.com/getsentry/sentry-ruby/issues/2393 was created to address this.

A more insidious issue followed - it seemed that when this error occurred the background thread got stuck. Eventually our sidekiq instances stopped responding altogether. We think this was fixed by disabling background workers on Sidekiq:

  config.background_worker_threads = 0 if Sidekiq.server?

After setting this we've no longer seen crashes, but it's also possible that it just lowered the rate of issues enough that the daily cycling of sidekiq prevented it from ever getting to the point of killing a dyno entirely.

Reproduction Steps

Unsure - potentially trying serialize a very large object.

Expected Behavior

  1. SystemStackErrors don't occur at all
  2. Failures in the background worker are handled gracefully

Actual Behavior

  1. SystemStackError occurs while serializing a breadcrumb
  2. Sidekiq process become unresponsive

Ruby Version

3.1.3

SDK Version

5.9.0

Integration and Its Version

No response

Sentry Config

No response