appsignal / appsignal-elixir

🟪 AppSignal for Elixir package
https://www.appsignal.com/elixir
MIT License
282 stars 82 forks source link

Phoenix.Router.NoRouteError being reported on 404 after upgrade to Phoenix 1.4 and Cowboy 2 #417

Closed lpil closed 5 years ago

lpil commented 5 years ago

As in the title, since upgrading today 404s are showing up in my appsignal error log.

Here is an example error: https://appsignal.com/stunt-hamster-ltd/sites/5b8578b016b7e23a6a18c03b/incidents/43

jeffkreeftmeijer commented 5 years ago

Hey Louis! 👋

Thanks for letting us know about this. After some initial research, I was able to reproduce this issue on plug_cowboy ~> 2.0.0. I don’t run into the same issue on plug_cowboy ~> 1.0.0.

It looks like the exception, which is skipped in Appsignal.Plug because it has a 500 status code, gets matched in the ErrorHandler after receiving an error report from the error logger. We’re looking at why this is happening, and will update this issue when we know more.

lpil commented 5 years ago

Thank you! :)

jeffkreeftmeijer commented 5 years ago

In https://github.com/phoenixframework/phoenix/pull/2861/files#diff-0b0978dc41c81162d7eb10536f08fd76R42, a the Cowboy2Handler started exiting processes when an exception occurs. Because of this, there are two things happening here, which we should look into.

  1. The error is logged because the handler process crashes. Like any other process crash, this creates an error report in the logger, causing an error sent to AppSignal. In this case, since it's a 4xx-type error, we don't want that.

    We’re planning to look into error filtering. Instead of filtering in the Plug, we're investigating filtering when an error gets added to the Transaction (much like our ignore_errors configuration option), so we can take errors we don’t care about out there. This seems to be the only way, as 4xx type errors can now also crash the handler.

  2. Since the error is wrapped in a tuple before being sent over to the error logger, AppSignal fails to parse the error, resulting in string-based error names. We're planning an update to our error matcher, which should normalize everything sent to the error logger to an Elixir-style error.

    However, in this case, since that would result in a generic ErlangError, as these errors aren't in a format the Exception.normalize function can handle. We might have to unwrap these errors, like we do with Plug.Conn.WrapperErrors already.

As for right now; this patch will prevent Elixir errors nested in tuples (like ones raised from the Cowboy2Handler) to be sent to AppSignal altogether. Although it might be overkill, it didn't ignore any errors aside from these, and fixes this issue for now. It'd be great if you could test it by switching to the ignore-nested-errors branch and raising some errors locally.

Aside from that, we'll tackle https://github.com/appsignal/appsignal-elixir/issues/357 first, which will make sure we get properly normalized errors into AppSignal by implementing an updated error matcher. We'll then look at filtering in https://github.com/appsignal/appsignal-elixir/issues/410, to find a good way to reliably filter errors.

lpil commented 5 years ago

Hi, if I'm using the ignore-nested-errors branch is it expected that 404s are filtered out? I gave it a test and they are still reported.

jeffkreeftmeijer commented 5 years ago

Hey @lpil,

Could you try once more? I've updated the code to specifically match these nested errors, and I've added a when clause to make sure we only match errors with a plug_status of < 500.

For anyone reading along, the patch in this branch is for this very specific error. We're working on updating our error handling (#357) and filtering (#410) to have a single place where errors are unpacked and filtered out, and will release that in an upcoming version.

jeffkreeftmeijer commented 5 years ago

Quick update; we're reviewing https://github.com/appsignal/appsignal-elixir/pull/423, which closes #357, and will make sure errors are properly unpacked.

After we've merged that, we'll review https://github.com/appsignal/appsignal-elixir/tree/error-handler, which introduces a single error handler, which will enforce the same filtering rules for the Plug integration as our :error_logger integration. This will make sure errors with a plug_status of < 500 are filtered out properly.

lpil commented 5 years ago

Hey @jeffkreeftmeijer. Sorry about the delay!

I'm using the ignore-nested-errors branch and it seems to have silenced the 404s. :)

josevalim commented 5 years ago

Unfortunately exits are a bit awkward in Erlang because an exit can have any shape. However, exits caused by errors/exceptions follow a specific format, which is {error, stacktrace}. The error may be an Erlang error (i.e. any term) or an Elixir exception, therefore it has to be normalized. The issue is that an exit can also be {:hello, [:not_a_stacktrace]}, so we need a series of heuristics to handle both the exception and the stacktrace.

In fact we do this whole dance in our Logger.Translator.

I am not quite sure how AppSignal is processing those failures but I would recommend "listening" to Elixir's Logger (instead of Erlang's error_logger) from version v1.7+ if you don't yet. We heard that many people bypassed the Elixir Logger because we lost some metadata but from v1.7+ we actually process and include metadata for everyone. Plus, by leveraging our Logger, you got support for the new OTP logger transparently too.

However, if using our Logger is not an option, then you will have to replicate the normalization code linked above (you may do that already, I didn't check code, sorry :D).

svileng commented 5 years ago

Hi @lpil @jeffkreeftmeijer I have an issue with 404 exceptions showing up in AppSignal too, but I'm not sure if it is the same issue you're discussing here (I can't access the link @lpil posted). Here is my error/stacktrace:

Process #PID<0.1251.39> terminating: {{{%Phoenix.Router.NoRouteError{conn: %Plug.Conn{adapter: {Plug.Cowboy.Conn, :...}, assigns: %{}, before_send: [#Function<1.112466771/1 in Plug.Logger.call/2>], body_params: %{}, cookies: %Plug.Conn.Unfetched{aspect: :cookies}, halted: false, host: "[removed]", method: "GET", owner: #PID<0.1251.39>, params: %{}, path_info: ["sitemap.xml"], path_params: %{}, port: 80, private: %{MyAppWeb.Router => {[], %{}}, :appsignal_transaction => AppSignal.Transaction{jfit1os3m84cu}, :phoenix_endpoint => MyAppWeb.Endpoint, :phoenix_router => MyAppWeb.Router, :plug_session_fetch => #Function<1.58261320/1 in Plug.Session.fetch_session/1>}, query_params: %{}, query_string: "", remote_ip: {0, 0, 0, 0, 0, 65535, 2597, 46795}, req_cookies: %Plug.Conn.Unfetched{aspect: :cookies}, req_headers: [{"accept", "*/*"}, {"accept-encoding", "gzip"}, {"cf-connecting-ip", "46.229.168.145"}, {"cf-ipcountry", "US"}, {"cf-ray", "48353bf69fccc19a-IAD"}, {"cf-visitor", "{\"scheme\":\"https\"}"}, {"connect-time", "1"}, {"connection", "close"}, {"host", "[removed]"}, {"total-route-time", "0"}, {"user-agent", "Mozilla/5.0 (compatible; SemrushBot/2~bl; +http://www.semrush.com/bot.html)"}, {"via", "1.1 vegur"}, {"x-forwarded-for", "46.229.168.145, 162.158.79.102"}, {"x-forwarded-port", "443"}, {"x-forwarded-proto", "https"}, {"x-request-id", "5d1879a5-3553-4f5b-9803-fa133ee29e84"}, {"x-request-start", "1543831729996"}], request_path: "/sitemap.xml", resp_body: nil, resp_cookies: %{}, resp_headers: [{"cache-control", "max-age=0, private, must-revalidate"}, {"strict-transport-security", "max-age=31536000"}, {"x-request-id", "5d1879a5-3553-4f5b-9803-fa133ee29e84"}], scheme: :https, script_name: [], secret_key_base: :..., state: :unset, status: nil}, message: "no route found for GET /sitemap.xml (MyAppWeb.Router)", plug_status: 404, router: MyAppWeb.Router}, [{MyAppWeb.Router, :__match_route__, 4, [file: 'lib/myapp_web/router.ex', line: 1]}, {MyAppWeb.Router, :call, 2, [file: 'lib/phoenix/router.ex', line: 304]}, {MyAppWeb.Endpoint, :plug_builder_call, 2, [file: 'lib/myapp_web/endpoint.ex', line: 1]}, {MyAppWeb.Endpoint, :"call (overridable 2)", 2, [file: 'lib/myapp_web/endpoint.ex', line: 52]}, {MyAppWeb.Endpoint, :call, 2, [file: 'lib/myapp_web/endpoint.ex', line: 1]}, {Phoenix.Endpoint.Cowboy2Handler, :init, 2, [file: 'lib/phoenix/endpoint/cowboy2_handler.ex', line: 34]}, {:cowboy_handler, :execute, 2, [file: '/tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_handler.erl', line: 41]}, {:cowboy_stream_h, :execute, 3, [file: '/tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_stream_h.erl', line: 296]}]}, {MyAppWeb.Endpoint, :call, [%Plug.Conn{adapter: {Plug.Cowboy.Conn, :...}, assigns: %{}, before_send: [], body_params: %Plug.Conn.Unfetched{aspect: :body_params}, cookies: %Plug.Conn.Unfetched{aspect: :cookies}, halted: false, host: "[removed]", method: "GET", owner: #PID<0.1251.39>, params: %Plug.Conn.Unfetched{aspect: :params}, path_info: ["sitemap.xml"], path_params: %{}, port: 80, private: %{}, query_params: %Plug.Conn.Unfetched{aspect: :query_params}, query_string: "", remote_ip: {0, 0, 0, 0, 0, 65535, 2597, 46795}, req_cookies: %Plug.Conn.Unfetched{aspect: :cookies}, req_headers: [{"accept", "*/*"}, {"accept-encoding", "gzip"}, {"cf-connecting-ip", "46.229.168.145"}, {"cf-ipcountry", "US"}, {"cf-ray", "48353bf69fccc19a-IAD"}, {"cf-visitor", "{\"scheme\":\"https\"}"}, {"connect-time", "1"}, {"connection", "close"}, {"host", "[removed]"}, {"total-route-time", "0"}, {"user-agent", "Mozilla/5.0 (compatible; SemrushBot/2~bl; +http://www.semrush.com/bot.html)"}, {"via", "1.1 vegur"}, {"x-forwarded-for", "46.229.168.145, 162.158.79.102"}, {"x-forwarded-port", "443"}, {"x-forwarded-proto", "https"}, {"x-request-id", "5d1879a5-3553-4f5b-9803-fa133ee29e84"}, {"x-request-start", "1543831729996"}], request_path: "/sitemap.xml", resp_body: nil, resp_cookies: %{}, resp_headers: [{"cache-control", "max-age=0, private, must-revalidate"}], scheme: :http, script_name: [], secret_key_base: nil, state: :unset, status: nil}, []]}}, [{Phoenix.Endpoint.Cowboy2Handler, :init, 2, [file: 'lib/phoenix/endpoint/cowboy2_handler.ex', line: 43]}, {:cowboy_handler, :execute, 2, [file: '/tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_handler.erl', line: 41]}, {:cowboy_stream_h, :execute, 3, [file: '/tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_stream_h.erl', line: 296]}, {:cowboy_stream_h, :request_process, 3, [file: '/tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_stream_h.erl', line: 274]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}
(phoenix) lib/phoenix/endpoint/cowboy2_handler.ex:43: Phoenix.Endpoint.Cowboy2Handler.init/2
(cowboy) /tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_handler.erl:41: :cowboy_handler.execute/2
(cowboy) /tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_stream_h.erl:296: :cowboy_stream_h.execute/3
(cowboy) /tmp/build_abab193e5b06074d62e0032c7fc6c904/deps/cowboy/src/cowboy_stream_h.erl:274: :cowboy_stream_h.request_process/3
(stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

(originally reported here)

@lpil is this the same exception you're having?

jeffkreeftmeijer commented 5 years ago

Hey @svileng,

Correct, that's it. Could you check if the errors disappear when you switch to the ignore-nested-errors branch? We're reviewing a proper solution to this in #423, and we're hoping to release a beta for that shortly.

svileng commented 5 years ago

Cool, I haven't checked yet — I'll find some time to try it out and will report back! Thanks.

jeffkreeftmeijer commented 5 years ago

Hey @josevalim,

Thanks for taking the time to explain this here. We're updating our error handling in #423, to allow us to properly unpack them and make sure we always get an Exception object sent over to us. The fixes in there are based on Erlang's :error_logger (for backwards compatibility for older Elixir versions), still, but one of our next steps will be moving to Elixir's Logger.

Thanks!

jeffkreeftmeijer commented 5 years ago

Hey all,

Thanks for your patience while we've been working on this. We've just released version 1.9.0-beta.1 of the Elixir integration, which includes #423 and #425. Together, they should fix #357 by normalizing all errors to Elixir exceptions, and this issue by using the same error handler for both errors originating from the Plug integration, as well as the error logger.

It'd be great if you could switch to the beta, and let us know if it resolves this issue. You can do that by updating your :appsignal dependency to {:appsignal, "~> 1.9.0-beta.1"} in your projects' mix.exs files.

lukerandall commented 5 years ago

I've upgraded a brand new Phoenix 1.4 app to 1.9.0-beta.1 and it fixes the issue for me. 👍

jeffkreeftmeijer commented 5 years ago

1.9.0 was released, which includes the fix for this issue. Closing this one. Please re-open if you run into this on a version above 1.9.0.