getsentry / sentry-elixir

The official Elixir SDK for Sentry (sentry.io)
https://sentry.io
MIT License
625 stars 185 forks source link

Sentry outages cause Oban crons to no longer check in #811

Open joshk opened 16 hours ago

joshk commented 16 hours ago

Environment

Version 10.7.1 Oban cron integration enabled

Steps to Reproduce

The monitoring of Oban crons work perfectly, but when there is a Sentry outage, we have found that crons continue not to report even once the outage is resolved.

After looking into our logs, we found this:

Handler Sentry.Integrations.Oban.Cron has failed and has been detached. Class=:error\nReason={:case_clause, {:error, :too_many_retries}}\nStacktrace=[\n {Sentry.Client, :maybe_log_send_result, 2,\n [file: ~c\\"lib/sentry/client.ex\\", line: 370]},\n {Sentry.Client, :send_check_in, 2,\n [file: ~c\\"lib/sentry/client.ex\\", line: 107]},\n {Sentry.Integrations.Oban.Cron, :handle_event, 4,\n [file: ~c\\"lib/sentry/integrations/oban/cron.ex\\", line: 27]},\n {:telemetry, :\\"-execute/3-fun-0-\\", 4,\n [file: ~c\\"/build/deps/telemetry/src/telemetry.erl\\", line: 167]},\n {:lists, :foreach_1, 2, [file: ~c\\"lists.erl\\", line: 2310]},\n {Oban.Queue.Executor, :record_started, 1,\n [file: ~c\\"lib/oban/queue/executor.ex\\", line: 97]},\n {Oban.Queue.Executor, :call, 1,\n [file: ~c\\"lib/oban/queue/executor.ex\\", line: 73]},\n {Task.Supervised, :invoke_mfa, 2,\n [file: ~c\\"lib/task/supervised.ex\\", line: 101]}\n]"

The message above was after a bunch of Failed to send Sentry event. Received 503 from Sentry server messages.

Once our app is restarted everything works as expected and the crons are reported as healthy.

Expected Result

The SDK should recover from Sentry outages.

savhappy commented 6 hours ago

@whatyouhide I think this should just be wrapped in a try/catch statement. Yes?

  def attach_telemetry_handler(config) when is_list(config) do
    _ = :telemetry.attach_many(__MODULE__, @events, &__MODULE__.handle_event/4, config)
    :ok
  end
whatyouhide commented 5 hours ago

@savhappy definitely not, that's not what the stacktrace is pointing to. :telemetry.attach_many/4 doesn't fail, it can just return :ok | {:error, ...} so we don't really care about that.

Error comes from the stacktrace in the issue:

Stacktrace=[\n {Sentry.Client, :maybe_log_send_result, 2,\n [file: ~c\\"lib/sentry/client.ex\\", line: 370]} ...