Closed patmaddox closed 3 years ago
I'm sure this PR will need some more work (documenting the new function at a minimum). I am pretty new to Elixir and am not familiar with the different error types, so I'd appreciate some guidance on anything I might be missing.
I realize this library isn't for error reporting, but Google Error Reporting will automatically take care of that if the error is formatted correctly - which was easy to do, as you can see. So I hope this addition will be considered.
I have a bit more work to do to fully integrate this into our app, and will be working on it over the next few days, but I wanted to share it and get some early feedback. Thanks!
Hello @patmaddox, I like this feature a lot. We would love to aggregate errors from Elixir using native GoogleCloud services :).
I think it would be better to integrate it into the existing google_cloud_logger
formatter, so that if error contains a stack trace then we would properly format jsonPayload.message
and set a @type
(according to this doc it should work, but I'm not sure) but keep some original stacktrace in some other field (I think it would be kept in metadata already anyways).
@AndrewDryga I have tried both approaches - adding a separate GoogleErrorReporter
, and integrating directly into the GoogleCloudLogger
(which you can see at 472e16c). I've found the first way preferable. The main reason is that when explicitly logging, it includes an operation
value which is a unique identifier for the request. This lets you click on that identifier and find all logging statements that include it - so you can track down all log entries that are involved in the request that led to the error. When I integrated directly into GoogleCloudLogger
it didn't contain the operation
field, making the log information less useful.
So, feel free to use 472e16c as a starting point, and see if you can get it going. I'd be curious to see what part I missed.
This is working and is quite useful:
@patmaddox the list of errors looks good indeed. How do you use it? I'm asking because it would be really nice to have both JSON structured logs in GKE and errors reported, not just one of them (JSON logger setting for log provider is singular). Maybe we can generate a custom operation
instead?
I hooked it up with Plug.ErrorHandler.
Here's relevant config bits:
def router do
quote do
use Phoenix.Router
use Plug.ErrorHandler
use EpicenterWeb.ErrorHandler
import Plug.Conn
import Phoenix.Controller
import Phoenix.LiveView.Router
end
end
# This seems pretty generic and could probably get moved into logger_json library
defmodule EpicenterWeb.ErrorHandler do
alias LoggerJSON.Formatters.GoogleErrorReporter
defmacro __using__(_opts) do
quote do
defp handle_errors(conn, %{kind: kind, reason: reason, stack: stacktrace}) do
GoogleErrorReporter.report(kind, reason, stacktrace)
conn
end
end
end
end
config :logger_json, :google_error_reporter,
service_context: [
service: System.fetch_env!("ERROR_REPORTER_SERVICE_NAME"),
version: :code.priv_dir(:epicenter) |> Path.join("static/version.txt") |> File.read!() |> String.trim()
]
@AndrewDryga just following up on this - we've identified a pretty serious shortcoming with the separate ErrorHandler
approach. Because it's listed in the router, it will only be used when there are errors in Phoenix HTTP requests. It does not report errors that happen in web socket connections, or app initialization. So, we're going to go back to the approach of integrating it fully, so the default error handler adds the Google Error Reporting attribute.
The one thing that we lose with the integrated approach is that Phoenix no longer adds a request_id
or remote_ip
to the error. I believe this is because the Phoenix request runs in one process (with Logger.metadata
set), and the error handler runs in a different process and so doesn't have the request id or remote IP. We'll look into this more, but I am pretty new to Elixir / Phoenix and so am curious if you have any pointers on how we might use the default Elixir error handling while also maintaining any logger metadata that Phoenix adds.
@patmaddox I quickly checked the code and the only way I have in mind to persist request_id
in the Process metadata would be to have a double approach: hook up to Plug.ErrorHandler and still have default formatter being able to format any exception in a way that would report those errors to GoogleErrorReporter. The problem we would have then is preventing the same error to be reported twice and this is something I don't know how to approach yet. We might avoid re-raising the excepting and log directly but then we might face compatibility issues with other code that expects errors to be raised by Plug/Phoenix which we should avoid. Maybe we can add some metadata to https://github.com/elixir-plug/plug/blob/v1.11.0/lib/plug/error_handler.ex#L101 and propagate it to the logger but I also don't know how exactly from the top of my head.
Sorry for taking so long to reply, I'm buried in work and life :).
This is how AppSignal library deduplicates those reports: https://github.com/appsignal/appsignal-elixir-plug/blob/main/lib/appsignal_plug.ex#L61-L62. It ignores reports from processes that are going to raise an exception which looks like something logging library should not do because it would require some central coordinator to keep a list of those ignored pids.
I'm going to merge this and use format_stacktrace
you have added in GKE formatter. This way it would be enabled by default and if somebody wants to track error manually it still would be possible to do so, thank you for your contribution ❤️
@patmaddox one of the ways you can deduplicate logs can be subscribing to :exception
event that is mentioned here and ignore exception using a macro.
This provides a new function `LoggerJSON.Formatters.GoogleErrorReporter.report/3) that formats errors to be compatible with Google Error Reporting. Reporting an error with this function will create a new error record in GER, complete with stack trace. It uses two mechanisms to achieve this:
"@type": "type.googleapis.com/google.devtools.clouderrorreporting.v1beta1.ReportedErrorEvent"
metadata, which forces GER to record the error, in case something is wrong with the formatting.Aggregated errors
A single error instance