Closed mvaled closed 1 year ago
Can you try adding
http:
_client: reqwest
to your config?
@untitaker
Thanks for you response. After that change, things got bad turn:
2020-12-07T22:15:14Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-07T22:15:14Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-07T22:15:31Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-07T22:15:31Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
Cool, that's worse. Do you have capability to inspect the response body of sentry to relay? I wonder why the request is invalid.
Context: We intend to switch to reqwest as HTTP client (the option to switch back/forth is temporary), so I think it's wasted time to investigate bugs in the old client.
Do you have capability to inspect the response body of sentry to relay?
Probably, but right now I have no clue. I would have to look the at the code to see if I could trap the request/response or put DEBUG=True in the webserver: Would it help?
Changing relay's logging level to DEBUG didn't helped much, though.
I'm looking at the code of the upstream actor, and I noticed I've never spotted the info log at:
log::info!("relay successfully registered with upstream");
slf.auth_state = AuthState::Registered;
slf.auth_backoff.reset();
But I've received events from clients, so... what's happening here? Am I missing something?
Well I'm not getting events from clients right. I'm triggering events in some the applications and I don't see them appear in Sentry :*(.
Still the logs just show:
2020-12-08T00:17:19Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-08T00:17:19Z [relay_server::actors::upstream] ERROR: authentication encountered error: could not send request to upstream
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: Connection reset by peer (os error 104)
2020-12-08T00:17:43Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
2020-12-08T00:17:43Z [relay_server::actors::events] ERROR: error processing event: event exceeded its configured lifetime
Hum... I ran docker pull
for all the images snuba
, rely
, sentry
and symbolicator
. Now I'm getting these logs:
2020-12-08T01:33:42Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-08T01:33:42Z [relay_server::actors::upstream] INFO: relay successfully registered with upstream
2020-12-08T01:33:42Z [relay_server::actors::project_upstream] ERROR: error fetching project states: could not send request to upstream
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: IO error: Connection reset by peer (os error 104)
caused by: Connection reset by peer (os error 104)
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
2020-12-08T01:33:43Z [relay_server::actors::events] ERROR: error processing event: failed to resolve project information
caused by: failed to fetch project state from upstream
I tried once more to use reqwest
, but it failed again with 400 Bad request. But I wasn't able to catch the request to why sentry's complaining about it. And now I'm back to the cycle of ERROR: authentication encountered error
...
I'll report back tomorrow if I find anything.
Best regards.
@mvaled I have opened #878 for the 400 bug, let me know if that fixes the issue
@untitaker
As commented there the issue still happens. As soon as I switch to reqwest
, I get the 400 error. This is the full log for the first seconds after restarting relay with reqwest
:
2020-12-17T12:31:43Z [relay::setup] INFO: launching relay from config folder /work/.relay
2020-12-17T12:31:43Z [relay::setup] INFO: relay mode: managed
2020-12-17T12:31:43Z [relay::setup] INFO: relay id: <redacted>
2020-12-17T12:31:43Z [relay::setup] INFO: public key: <redacted>
2020-12-17T12:31:43Z [relay::setup] INFO: log level: INFO
2020-12-17T12:31:43Z [relay_server::actors::outcome::processing] INFO: OutcomeProducer started.
2020-12-17T12:31:43Z [relay_server::actors::events] INFO: starting 8 event processing workers
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: upstream relay started
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:43Z [relay_server::actors::store] INFO: store forwarder started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::service] INFO: spawning http server
2020-12-17T12:31:43Z [relay_server::actors::project_redis] INFO: redis project cache started
2020-12-17T12:31:43Z [relay_server::service] INFO: listening on: http://0.0.0.0:3000/
2020-12-17T12:31:43Z [actix_net::server::server] INFO: Starting 8 workers
2020-12-17T12:31:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-17T12:31:43Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:43Z [actix_net::server::server] INFO: Starting server on 0.0.0.0:3000
2020-12-17T12:31:43Z [relay_server::actors::controller] INFO: relay server starting
2020-12-17T12:31:43Z [relay_server::actors::connector] INFO: metered connector started
2020-12-17T12:31:43Z [relay_server::actors::events] INFO: event manager started
2020-12-17T12:31:43Z [relay_server::actors::project_local] INFO: project local cache started
2020-12-17T12:31:43Z [relay_server::actors::project_upstream] INFO: project upstream cache started
2020-12-17T12:31:43Z [relay_server::actors::project_cache] INFO: project cache started
2020-12-17T12:31:43Z [relay_server::actors::relays] INFO: key cache started
2020-12-17T12:31:43Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
2020-12-17T12:31:44Z [relay_server::actors::upstream] INFO: registering with upstream (http://web.sentry.svc:9190/)
2020-12-17T12:31:44Z [relay_server::actors::upstream] ERROR: authentication encountered error: upstream request returned error 400 Bad Request
caused by: No valid json body
Fair enough, we merged this because it fixed the issue for another customer. I'm not really sure what to try then
Probably, but right now I have no clue. I would have to look the at the code to see if I could trap the request/response or put DEBUG=True in the webserver: Would it help?
assuming you're sure you're running the correct commit, let's try request capturing: I was thinking more of using wireshark assuming you have unencrypted traffic.
Hi @untitaker,
I'm rebuilding the image from master:
$ git log -1 --oneline
33f0dfbb feat: Enable use of multiple value types per field (#882)
$ build docker . -t getsentry/relay:nightly
So, I'll be running relay with the right commit.
As for using wireshark, I'm not really sure if I'm allowed. Let's find out.
Unfortunately, using wireshark is not possible. I don't have physical access to the server and X11 forwarding via SSH is not allowed :(
I'm gonna use tcpdump
(or something like that) in the server and import it in wireshark locally.
@untitaker
Using tcpdump
I could sniff the traffic going to the upstream. Once with actix, and once with reqwest. I think I have spotted a difference:
On the left is the actix trace, on the right is the reqwest. The JSON payload using reqwest is Array (seemingly empty) instead of an object.
Looking again at the traces I noticed my previous comment was a response coming from the server, not the POST send by the relay. So I looked again.
In the good trace, I observe the following:
Relay sends a POST to upstream with a JSON payload {relay_id: ..., public_key: ..., version: ...}
. There are two custom HTTP headers: x-sentry-relay-id: bcfd9d5a-...
which matches the relay_id
in the payload; and x-sentry-relay-signature
, which I assume is a signature of the HTTP request with headers included. Other headers are:
content-type: application/json
accept-encoding: gzip, deflate
user-agent: actix-web/0.7.19
content-length: 130
date: Fri, 18 Dec 2020 16:11:52 GMT
The upstream returns 200, with a JSON payload {relay_id: ..., token: ...}
In the "bad" trace:
A similar POST is done. The JSON payload is the same as before. But some HTTP headers are different:
content-type: application/json
content-encoding: identity
accept: */*
accept-encoding: gzip
transfer-encoding: chunked
I hope this may provide some clues about what's happening.
Thank you, this in fact points out the problem: transfer-encoding: chunked
. The sentry web container is not able to handle this, and we should not be creating such a request. We'll submit a bugfix for this, thank you for uncovering it. (Edit: https://github.com/getsentry/relay/pull/890).
The configuration option you're setting, _client: reqwest
, is unstable and still being developed. I'm curious what @untitaker wanted to test with this. In the meanwhile, I would suggest to switch back to the regular client.
As this is likely an issue with the way the containers are being run or networking is set up, I would propose to move this issue to getsentry/onpremise
. Any objections, @BYK?
Closing since both self-hosted and Relay have changed substantially. If a similar bug is still occurring, please re-submit to the self-hosted repository. Thanks!
I've had an on-premises Sentry version 9 for several years. Two weeks ago I found the time to update to Sentry 20. I took a look at your onpremise configuration and I figure a way to deploy in our server.
I've noticed relay disconnects to the upstream web server each minute:
I think this is caused by a
Keep-Alive
header somewhere. Can I increase it? Is there a more efficient protocol to connect relay to the upstream web?