bugsnag / bugsnag-ruby

BugSnag error monitoring & reporting software for rails, sinatra, rack and ruby
https://docs.bugsnag.com/platforms/ruby
MIT License
246 stars 174 forks source link

Messages dropped because of a full thread queue #816

Closed krisdigital closed 2 months ago

krisdigital commented 5 months ago

Describe the bug

When the thread in the thread queue dies, messages are pushed on the queue and never delivered until the process restarts.

Steps to reproduce

See the example code - we found out by accident that messages where stuck and not delivered to Bugsnag. We got the message "Dropping notification, 101 outstanding requests" in the logs.

Environment

Example code snippet

This example code mirrors the behaviour in lib/bugsnag/delivery/thread_queue.rb:

p 'Start'
queue = Queue.new

queue.push(proc do
  p '1'
end)

queue.push(proc do
  p '2'
end)

queue.push(proc do
  p 1 / 0
end)

queue.push(proc do
  p '3'
end)

worker_thread = Thread.new do
  p 'Thread Start'
  while x = queue.pop
    x.call
  end
end

p "Alive: #{worker_thread.alive?}, Status: #{worker_thread.status}"

# worker_thread.join
sleep 3

p "Alive: #{worker_thread.alive?}, Status: #{worker_thread.status || 'nil'}"

Output:

"Start"
"Alive: true, Status: run"
"Thread Start"
"1"
"2"
#<Thread:0x00000001050fdb00 threads.rb:20 run> terminated with exception (report_on_exception is true):
threads.rb:13:in `/': divided by 0 (ZeroDivisionError)
    from threads.rb:13:in `block in <main>'
    from threads.rb:23:in `block in <main>'
"Alive: false, Status: nil"

Question: Could you maybe check if the worker_thread is still alive before pushing new messages and if not start a new thread? In the code example we can see, that after the exception the thread reports to be dead. So maybe this could be used as indicator to create a new worker thread?

mclack commented 4 months ago

Hi @krisdigital

Thanks for raising this. We're looking into this and will update the thread as soon as we can.

clr182 commented 4 months ago

Hi @krisdigital

Thank you for your patience as we investigated this issue further.

The issue seems to stem from the worker thread terminating. Are you aware of any reasons why this may be happening? if so, could you please elaborate?

For background, the thread should stay alive until we stop it in an at_exit block: https://github.com/bugsnag/bugsnag-ruby/blob/990f8359d7dd34722a9f46d6d928df8d28c3a55a/lib/bugsnag/delivery/thread_queue.rb#L59-L63

krisdigital commented 4 months ago

Hi @clr182,

thank you for looking into it! I don't know why the thread terminated in our case, sadly. It may have been a bad message in an exception? But it is hard to tell.

The problem is that in this case the error reporting silently stops working. Would it maybe make sense to check if the thread is still running when a new message is pushed on the queue?

clr182 commented 4 months ago

Hi Kris,

We do believe the worker thread dying is the root cause of your issue in this case. As previously stated, this thread should always be alive until wewe stop it in an at_exit block. Perhaps you could implement some further logging to determine the cause of this dying thread and investigate further from your side?

krisdigital commented 3 months ago

Hi @clr182,

all right thank you, we will continue to look for the reason of the thread exiting!

mclack commented 2 months ago

Hi @krisdigital

As there hasn't been any activity on the thread for a while, we are now going to close this issue.

If you continue to experience issues with this, or have any other questions, please feel free to reopen this or open a ticket with us directly by contacting support@bugsnag.com with further details or relevant information.