Open st0012 opened 3 years ago
Long time user of sentry-ruby and sentry-raven. I didn't know this feature existed until a current migration of a large rails app to sentry. I noticed it while reading through the docs and attempted to set it up but immediately ran into issues and elected to defer in order to simplify the migration. In addition, we have a large amount of complex queues. Injecting this option into our system would likely require a bit of thought so as not to bowl over important queues while keeping error reporting timely.
I suppose this is a vote to drop it?
Hey! We currently use the async
option. Mostly because it was recommended in the docs when we setup the app a long time ago (not sure the background worker option was even a thing at that time π΄πΌ ).
I'm curious about that part:
The background worker doesn't queue more than 30 events. So even when there's a spike, it's unlikely to consume all the memory.
What would happen if there is a spike in events, let's say from a noisy error filling the queue. Would another error happening at the same time be silently dropped because the queue is full? Or am I misunderstanding how it works?
I'd like to switch away from async
because json serialization from the async version mean we have to check two version of the payload when doing custom processing in before_send
, ex:
data = event.to_hash
errors = data.dig(:exception, :values) if data[:exception]
errors = data.dig("exception", "values") if data["exception"]
not sure the background worker option was even a thing at that time π΄πΌ
Background worker was added since v4.1.0. So it's a new thing for most users I think π
What would happen if there is a spike in events, let's say from a noisy error filling the queue. Would another error happening at the same time be silently dropped because the queue is full? Or am I misunderstanding how it works?
As of now, the queue doesn't distinguish events. So if there's a spike of a particular error, other errors may not make it into the queue. But personally I'm not worry about this because:
async
approach with a sidekiq/resque...etc. worker due to the reason I described in the issue. If there's a spike big enough to overflow the SDK's queue and drop some events, it'll probably overflow your background job queue with the async
option too and/or pose a greater damage to your system.
At my job, we are working fine with the async
because we already have the sidekiq in our stack. I can't understand why async
can't be an optional way to deliver events without async
deprecation? As a solution, you can highlight possible issues with async
in the documentation.
Maybe I can't understand the problem because using sentry only for exceptions without APM.
@Bugagazavr
There are 2 main cost of having this option around:
sentry-ruby
is the only one that supports such option. This means we always need to consider this extra condition when making changes to the event sending logic. And it'll make future SDK alignment harder. (It surely made sentry-raven
-> sentry-ruby
conversion harder)If the upside is high, these cost wouldn't be an issue. That's why we have had it for many years. But since we already have a better solution for the problem (background worker) that has much less downside, I don't think it's worth it now.
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog
or Status: In Progress
, I will leave it alone ... forever!
"A weed is but an unloved flower." β Ella Wheeler Wilcox π₯
We were using a custom sidekiq task in an async
block. But multiple times, we had the issue of
When there is an event spike, it can flood the medium storage (Redis) and take down the entire system.
And it was a nightmare.
We just switched to default by removing async
and use the BackgroundWorker
that use concurrent-ruby
. Work perfectly for us for the moment. I will post a message on the next spike. π
I think removing async
seems fine. I was using it to queue up a small custom ActiveJob that just called Sentry, but I'm happy to just use the built in method. I agree event spikes are fine to cap at X events since Sentry would drop them anyway and you don't want to overwhelm your system anyway. You might consider in the docs telling users that they can just remove the async line and things should work fine. I'm testing that now to be sure, but I believe that's the recommendation?
@trevorturk Thanks for the feedback. The current documents on Sentry and RDoc both suggests that it's a deprecated option and points user to this issue for more info π
I believe that's the recommendation?
Yes you should be able to delete it directly without any issue. But if you do, please report it here and I'll fix it ASAP π
It all worked fine when I just deleted it, thank you!
But for script programs, the process often leaves before the worker is able to send the event. This is why hint: { background: false } is required in rake integrations.
Can you elaborate on why this is and why this wouldn't be in https://github.com/getsentry/sentry-ruby/blob/efcf170b5f6dd65c3b047825bddd8fde87fc6b7b/sentry-ruby/lib/sentry/rake.rb?
@dillonwelch sorry that I forgot to update the description. that issue has been addressed in https://github.com/getsentry/sentry-ruby/pull/1617 and don't exist anymore π
For instrumenting AWS Lambda, it seems like it'd be ideal to use Lambda Extensions to deliver error reports. At first glance it looks like the async
option would be the way to integrate with Lambda Extensions. Do any of the Sentry libraries for other languages hook into Lambda Extensions? Would there be another way to do that with the Sentry Ruby SDK?
@jordan-brough I'm not familiar with AWS Lambda but it looks like the Sentry's Lambda Extension doesn't need language SDKs to work? It looks directly integrated with AWS Lambda. Can you explain your use case with more detail?
@jordan-brough currently, AWS lambda on ruby is not a first-class feature like for python/node where we have separate packages/integrations/layers implemented. AWS extensions are also something we are looking into right now for both those ecosystems. Ruby is not on the road map per se, but we would be open to seeing if that's something the community wants. I'm making a new issue to gauge interest since it's tangential here.
@sl0thentr0py thanks π I've commented over in that issue.
@st0012 how Lambda works is AWS spins up an instance of some code you've written and invokes "handler" method with an "event" payload (e.g. a web request). And then:
After the function ... [has] completed, Lambda maintains the execution environment for some time in anticipation of another function invocation. In effect, Lambda freezes the execution environment. When the function is invoked again, Lambda thaws the environment for reuse
https://docs.aws.amazon.com/lambda/latest/dg/runtimes-context.html
So you have some code that runs and then gets "frozen" and then maybe gets "unfrozen" again in the future to handle future events (e.g. web requests). But if you deploy new code, or if there is too long of delay between events then Lambda discards the frozen execution environment.
So if you have some async code you may end up in this situation:
And then your async code may never run if Lambda ends up discarding the frozen execution environment. And Sentry events would get lost.
So currently the way to make Lambda work reliably with Sentry would be to make Sentry operate synchronously. But then you have Sentry possibly slowing down your application code and/or potentially affecting it if there were Sentry bugs/outages etc (even though I'm sure that would never happen! π). Which I assume is one reason why Sentry runs asynchronously in the first place.
Last year AWS released a solution to this issue called "Lambda Extensions". You can use Lambda extensions to allow a Lambda function to handle application code synchronously while also enqueuing asynchronous events to "extensions" (e.g. a Sentry extension) which don't block the main application code.
A configuration option like the async
config discussed in this PR might be a good way to integrate w/ that, though I haven't looked into the code here or the details on that in depth.
I agree dropping the async option.
I'm developing rails app that uses config.async with Sidekiq, but I will disable this config and use background worker or send synchronously.
I was originally aware of this issue and was considering disabling config.async
. However, before disabling it, an error spike caused sidekiq redis down.
I plan to disable config.async
and use background worker.
I am a little concerned about the queue limit. However, I checked latency of our rails app annd error reporting http request (to sentry.io), queue overflow will probably not occur.
It would be more reassuring to be able to check the queue overflow. sentry-ruby records queue overflow event and send that client report. https://github.com/getsentry/sentry-ruby/blob/5.2.0/sentry-ruby/lib/sentry/client.rb#L63 I would be pleasure if these queue overflow errors visualized on sentry.io's UI.
It doesn't look bad approach sending errors synchronously too. Our rails app's APM reports http request's to sentry.io latency is almost within 300ms. I feel that it is not that much of a problem for rack workers to consume that amount of time when an error occurs.
I would be pleasure if these queue overflow errors visualized on sentry.io's UI.
@hogelog this is on the product road map, but I can't give you an ETA on when it will be shipped yet. But we certainly want to expose these statistics to the user eventually.
It doesn't look bad approach sending errors synchronously too.
@hogelog I should also mention that if you have tracing enabled, transaction
events should also be taken into consideration as well. so I won't recommend sending errors synchronously if you do.
I would be pleasure if these queue overflow errors visualized on sentry.io's UI.
@hogelog this is on the product road map, but I can't give you an ETA on when it will be shipped yet. But we certainly want to expose these statistics to the user eventually.
I'm looking forward to it!
@hogelog I should also mention that if you have tracing enabled,
transaction
events should also be taken into consideration as well. so I won't recommend sending errors synchronously if you do.
Iβm not using tracing, so I was not aware. Considering the future, it may not be good to send events directly. thanks!
I greatly appreciate this extra warning message! It's a good warning. Can you tell me exactly what I should do? Should I simply remove this code from my codebase, or do I need to add something else?
config.async = lambda do |event, hint|
::Sentry::SendEventJob.perform_later(event, hint)
end
@ariccio You can simply delete it π
We at my company have been plagued by the async feature for 1-2 years now; hitting the limit of the payload and the rate limit made us scratch our heads on how to safeguard against those (before Sentry internally remedied them). How we've caught these issues? By seeing Sidekiq being brought to its knees and important messages not processed.
Right now, we have decided to end the async
reign, and I was glad to see this feature being deprecated since our last version update. :raised_hands:
The
async
option is unique to the Ruby SDK. It was designed to help send events asynchronously through different backends (e.g. a Ruby thread, Sidekiq worker...etc.). Depends on the backend, it can pose thread to the system due to its additional extra memory consumption. So it's an option with some trade-offs.But since version
4.1
, the SDK now has its own background worker managed (implemented with the famousconcurrent-ruby
library). It can handle most of the asyncThe
async
Option ApproachPros
Users can customize their event sending logic. But generally it's just a worker with
Sentry.send_event(event, hint)
.Cons
The Background Worker
concurrent-ruby
).Pros
Cons
Unsent events will die with the process. Generally speaking, the queue time in background worker is very low. And the chance of missing events due to this reason is small in web apps. But for script programs, the process often leaves before the worker is able to send the event. This is whyhint: { background: false }
is required in rake integrations.However, I don't think this problem can be solved with theasync
option.This drawback has been addressed in https://github.com/getsentry/sentry-ruby/pull/1617.
Missing Events During A Spike Because of Queue Limit
I know many users have concern about the background worker's 30 events queue limit will make them lose events during a spike. But as the maintainer and a user of this SDK, I don't worry about it because:
async
approach with a sidekiq/resque...etc. worker due to the reason I described in the issue.async
option too and/or pose a greater damage to your system.What I'm trying to say is, it's not possible to expect Sentry to accept "all events" during a big spike regardless which approach you use. But when a spike happens,
async
is more likely to become another bottleneck and/or cause other problems in your system.My Opinion
The
async
option seems redundant now and it could sometimes cause more harm. So I think we should drop it in version5.0
.Questions
The above analysis is only based on my personal usage of the SDK and a few cases I helped debug with. So if you're willing to share your experience, I'd like to knowEven though the decision has been made, we still would like to hear feedback about it:
async
option in your apps?background_worker_threads
config option?