Closed nodomain closed 3 years ago
Stack finished successfully but ingestion throws 503?
If that's the case, then setting all images to tag 1.4.0 might be a good workaround for now...
If not, you're going to have to check ingestion logs, specifically cloudwatch logs from the ecs containers for "relay" in the web ecs cluster.
I'll start a stack myself in the meantime from scratch to double check.
PS: you might want to check those logs before updating the images anyway...
Yes, stack finished but 503. Checking the 1.4.0 workaround right now - I'll leave the Clickhouse to v.1.5.0 right?
Ingest back up and running.
Awesome. Will check the compatibility again with latest release from sentry.
Cheers
Release 1.6.0 fixed the ingest error @nodomain !
Cool will have a Look. I set up a staging stack now to not break production. Meanwhile the prod environment just stopped processing new events. I’ll check with 1.6.0 then.
And as a side note: perhaps you could please use squash commits when merging into main, this would make things easier to understand if there would be less commits.
Cool will have a Look. I set up a staging stack now to not break production. Meanwhile the prod environment just stopped processing new events. I’ll check with 1.6.0 then.
Might want to check your monitoring data.
You might be DDos-ing yourself. If that's the case either modify the sample traces or increase the instance sizes for redis/postgres
Thanks for the hint. I now resized redis and rds accordingly, let’s see if that brings it back to live :-)
-- Sent from my mobile device. Please excuse brevity and spelling errors.
Am 23.01.2021 um 16:00 schrieb Marius Mitrofan notifications@github.com:
Cool will have a Look. I set up a staging stack now to not break production. Meanwhile the prod environment just stopped processing new events. I’ll check with 1.6.0 then.
Might want to check your monitoring data.
You might be DDos-ing yourself. If that's the case either modify the sample traces or increase the instance sizes for redis/postgres
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Redis Memory was the culprit. 100% full and swapping. I resized it and am now waiting for new events to come in.
Nothing changed. I also stopped all tasks in the workers group in ECS to have them restarted. Any other tips from your experience? No real errors in the logs :-/
When that happened to me it was due to poor definition if client dsn.
Check the "installation instructions" for a random project and try to submit an exception.
And go from there...
PS: remember about that SSL problem for ingest record mentioned in the readme
After resizing MSK as well, things seem to come back to live. Good learning experience :) Using Sentry for CSP report endpoint for a high traffic site does not co-exist with small t* instances... Furtermore the 1.6.0 upgrade went smoothly.
Thanks for your great efforts!
Hi,
I completely re-setup everything from scratch but the latest version now only gives a 503 on the ingest. Hence no events are captured.
Thanks for an idea.