getsentry / sentry-ruby

Sentry SDK for Ruby
https://sentry.io/for/ruby
MIT License
933 stars 494 forks source link

Invalid user in context in a sidekiq job #2194

Closed johnvuko closed 10 months ago

johnvuko commented 11 months ago

Issue Description

Hello, I've recently upgrade from raven-ruby 3.1.2 to sentry-ruby 5.1.5, sentry-rails 5.1.5 and sentry-sidekiq 5.1.5. Since in the exception comming from a Sidekiq job I see on Sentry a sentry_user but the only place where I set an user is in the ApplicationControler. I've also enable performance and profiling.

Reproduction Steps


# config/initializers/sentry.rb

require 'sentry-ruby'
require 'sentry-rails'
require 'sentry-sidekiq'

Sentry.init do |config|
  config.dsn = Rails.application.credentials.dig(:sentry, :dsn)
  config.enabled_environments = %w[production]
  config.logger.level = Logger::WARN
  config.breadcrumbs_logger = [:sentry_logger, :active_support_logger, :http_logger]
  config.send_default_pii = true
  config.excluded_exceptions = [
    'ActionController::RoutingError',
    'ActionController::InvalidAuthenticityToken'
  ]

  config.traces_sample_rate = 0.5
  config.profiles_sample_rate = 0.5
end

# app/controllers/application_controller.rb

class ApplicationController < ActionController::API
  before_action :set_sentry_user

  def set_sentry_user
     user = User.find(params[:user_id]) # user is found with access token in real application     
     Sentry.set_user(
        id: user.id,
        username: user.name,
        email: user.email
      )
  end
end

# app/jobs/my_job.rb

class MyJob
  include Sidekiq::Job

  sidekiq_options retry: 0

  def perform(some_params)
    Sentry.capture_message("[#{self.class}] there is a problem", extra: some_params)
  end
end

Expected Behavior

To not have a sentry_user in the Sidekiq section of the error.

Actual Behavior

There is a valid / existing sentry_user in the Sidekiq section of the error, probably leaking from somewhere. But strange because the server only process sidekiq job and Sentry.set_user is only call in application_controller. The user is not related to the job, the job is started by an api call from an external service.

Ruby Version

2.7.5

SDK Version

5.15.0

Integration and Its Version

Rails 6.1.7.6, Sidekiq 7.2.0

Sentry Config

require 'sentry-ruby' require 'sentry-rails' require 'sentry-sidekiq'

Sentry.init do |config| config.dsn = Rails.application.credentials.dig(:sentry, :dsn) config.enabled_environments = %w[production] config.logger.level = Logger::WARN config.breadcrumbs_logger = [:sentry_logger, :active_support_logger, :http_logger] config.send_default_pii = true config.excluded_exceptions = [ 'ActionController::RoutingError', 'ActionController::InvalidAuthenticityToken' ]

config.traces_sample_rate = 0.5 config.profiles_sample_rate = 0.5 end

sl0thentr0py commented 11 months ago

hmm that is very strange, i need a bit more info

johnvuko commented 11 months ago

Hello, Yes the job is schedule with a perform_async, I don't use ActiveJob but include Sidekiq::Job. The problem is in production environment with Redis, I don't use inline mode. I use AWS ECS, Puma and Sidekiq instances are on different container. So even, it Sentry current user was a global variable not thread safe it should not leaks into jobs.

The server is in api mode. Maybe it is the tracing that has a problem, but some jobs are not related to an user (webhook) and has a sentry_user. The problem appear in every errors for every jobs, not one specifically.

I've also added in the same time of the upgrade of Sentry, the gem newrelic to minitor performance, maybe it collide somehow.

Last thing, I have also a middleware (was alredy present and working before the migration):

class MiddlewareSidekiqServerTrackingJob
  include Sidekiq::ServerMiddleware

  def call(_job_instance, job_payload, _queue)
    begin
      if job_payload['first_arg_organization'] == true
        organization_id = job_payload['args'].first
        key = [organization_id, 'jobs_count'].join('-')
        begin
          Thread.current[:redis] ||= Redis.new(url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/0' })
          Thread.current[:redis].incr(key)
        rescue
          nil
        end
      end
      yield
    ensure
      if job_payload['first_arg_organization'] == true
        organization_id = job_payload['args'].first
        key = [organization_id, 'jobs_count'].join('-')
        begin
          Thread.current[:redis] ||= Redis.new(url: ENV.fetch('REDIS_URL') { 'redis://localhost:6379/0' })
          Thread.current[:redis].set(key, 0) if Thread.current[:redis].decr(key) < 0
        rescue
        end
      end
    end
  end
end

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add MiddlewareSidekiqServerTrackingJob
  end
end
sl0thentr0py commented 11 months ago

ah ok sorry i forgot that we indeed take the user and pass it on from the client middleware https://github.com/getsentry/sentry-ruby/blob/700046e8eaf05b53efb7e36d8abb358302f500d9/sentry-sidekiq/lib/sentry/sidekiq/sentry_context_middleware.rb#L60-L61

so if the user is set on the scope wherever you call perform_async that will be carried forward to the task. Is this not something you want? We can make it configurable and optional then.

johnvuko commented 11 months ago

That's a nice and useful feature but no my issue.

Some jobs are started by webhooks (a partner call an endpoint which call perform_async on a job), so no user is related or set in this process, but appear in job errors. Maybe I need to reset the user on Sentry? But it would indicate the scope is keep across different request which seems odd.

sl0thentr0py commented 11 months ago

no this shouldn't happen then, that's very strange, scope is isolated per request. I would need some kind of reproduction then, because I do not see a way this can happen if your controller action is not setting the user.

sl0thentr0py commented 11 months ago

Some jobs are started by webhooks (a partner call an endpoint which call perform_async on a job), so no user is related or set in this process, but appear in job errors.

is the rails server receiving webhooks the same as the one processing other normal actions (which do set user)?

johnvuko commented 11 months ago

Yes the rails server receiving the webhooks is the server which receive every HTTP requests and the server which create the jobs. The sidekiq instances which process the jobs are separate.

Sentry.set_user is call only when an user access token is provided in an API call and the user come from this access token. The problem also appear from controllers that doesn't inherit from the controller containing the Sentry.set_user, we have separate base controller for some endpoints (for partners or internal services).

Is there a way to see / inspect the scope of Sentry, so I can try to provide a sample project?

I will try to call Sentry.set_user({}) to see if it changes something.

sl0thentr0py commented 11 months ago

can you maybe log Sentry.get_current_scope.user in a before_action for the webhook controllers? That will tell you if the scope/user is somehow leaking into those threads.

johnvuko commented 11 months ago

I created a projet to test https://github.com/jonathantribouharet/sentry-ruby-2194

The problem occurs randomly and only if sentry is installed via a third party gem. You have to call a first time an endpoint setting an user, and then call multiple times another endpoint not setting the user, on this test project, you need 3 to 6 times to have an invalid response (with the user).

The issue being random, I think it is probably related to multi-thread.

sl0thentr0py commented 11 months ago

hmmm so in your example, you are only using sentry-ruby without sentry-rails which means our CaptureExceptions middleware that takes care of the scope isolation per request does not get injected. So in that case, there will indeed be scope leakage.

But I see now that this is ActionController::API, maybe our injection does not work there in general, I will test when I have time next week and get back to you.

johnvuko commented 11 months ago

In a previous commit of the test project I was using sentry-rails and there was the same issue, I removed it thinking it would help to find the souce of the problem. The bug also appear with ActionController::Base.

sl0thentr0py commented 10 months ago

ok so this is about you packaging it inside your gem and the railtie somehow not running, maybe you need to make a railtie in that gem and call our railtie (not sure how it works).

You can see that Sentry::Rails::CaptureExceptions is missing in the middleware with your setup

gem 'sentry-test-bug', path: 'lib/sentry-test-bug'
➜  sentry-ruby-2194 git:(main) ✗ bundle exec rails c
Loading development environment (Rails 6.1.7.6)
3.2.2 :001 > Rails.configuration.middleware
 =>
#<ActionDispatch::MiddlewareStack:0x0000000107cb7c78
 @middlewares=
  [ActionDispatch::HostAuthorization,
   Rack::Sendfile,
   ActionDispatch::Static,
   ActionDispatch::Executor,
   ActiveSupport::Cache::Strategy::LocalCache::Middleware,
   Rack::Runtime,
   Rack::MethodOverride,
   ActionDispatch::RequestId,
   ActionDispatch::RemoteIp,
   Rails::Rack::Logger,
   ActionDispatch::ShowExceptions,
   ActionDispatch::DebugExceptions,
   ActionDispatch::ActionableExceptions,
   ActionDispatch::Reloader,
   ActionDispatch::Callbacks,
   ActionDispatch::Cookies,
   ActionDispatch::Session::CookieStore,
   ActionDispatch::Flash,
   ActionDispatch::ContentSecurityPolicy::Middleware,
   ActionDispatch::PermissionsPolicy::Middleware,
   Rack::Head,
   Rack::ConditionalGet,
   Rack::ETag,
   Rack::TempfileReaper]>

if i remove that gem and add

gem 'sentry-ruby'
gem 'sentry-rails'

I get the correct middleware inserted

3.2.2 :002 > Rails.configuration.middleware
 =>
#<ActionDispatch::MiddlewareStack:0x0000000107e2a3a8
 @middlewares=
  [ActionDispatch::HostAuthorization,
   Rack::Sendfile,
   ActionDispatch::Static,
   ActionDispatch::Executor,
   ActiveSupport::Cache::Strategy::LocalCache::Middleware,
   Rack::Runtime,
   Rack::MethodOverride,
   ActionDispatch::RequestId,
   ActionDispatch::RemoteIp,
   Rails::Rack::Logger,
   ActionDispatch::ShowExceptions,
   Sentry::Rails::CaptureExceptions,   <----- makes sure scope is isolated among other things
   ActionDispatch::DebugExceptions,
   Sentry::Rails::RescuedExceptionInterceptor,
   ActionDispatch::ActionableExceptions,
   ActionDispatch::Reloader,
   ActionDispatch::Callbacks,
   ActionDispatch::Cookies,
   ActionDispatch::Session::CookieStore,
   ActionDispatch::Flash,
   ActionDispatch::ContentSecurityPolicy::Middleware,
   ActionDispatch::PermissionsPolicy::Middleware,
   Rack::Head,
   Rack::ConditionalGet,
   Rack::ETag,
   Rack::TempfileReaper]>
sl0thentr0py commented 10 months ago

@st0012 do you know why the railtie would not trigger if they're packaging our gems inside a gem of their own?

johnvuko commented 10 months ago

In my real project it's not used the same way, I removed every thing I can to have a minimal project to reproduce.

In my real project, it's inside a Rails Engine (Engine inherit from Railtie) but the require of sentry-ruby and sentry-rails are called inside an initializer of the engine which have the same effect => not adding the middleware. If I do the require in the main file at the root folder of the engine, it works (the middleware is present).

Maybe the issue is because of the order engines initializers are loaded.

It is strange but I have now a solution.

sl0thentr0py commented 10 months ago

alright considering this resolved then

johnvuko commented 10 months ago

Sorry, the issue is not solved, it's just the middlewares are now correctly loaded, I thougth that was it but no. But i'm still having scope leakeage in the exception from my jobs.

sl0thentr0py commented 10 months ago

please update your sample repro app with the latest state then and reopen the issue