Closed C3n7ral051nt4g3ncy closed 1 year ago
@C3n7ral051nt4g3ncy thank you for reporting the disruption of service. ๐ We are investigating it now. ๐งโ๐ป ๐
CC: @LuchoTurtle & @SimonLab time to add Monitoring+Alerts for QoS? ๐ญ
Database error:
** (exit) an exception was raised:
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 922ms.
This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
1. Ensuring your database is available and that you can connect to it
2. Tracking down slow queries and making sure they are running fast enough
3. Increasing the pool_size (although this increases resource consumption)
4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
See DBConnection.start_link/2 for more information
Looking directly at the log for the database on fly, it seems the postgres database is not available:
cmd/keeper.go:1526 failed to start postgres {"error": "postgres exited unexpectedly"}
@SimonLab Thanks for investigating. curious what sort of volume triggered this crash. ๐ญ ๐คทโโ๏ธ
What do we need to do to get Postgres
back online? ๐ญ
Currently updating the postgres image with fly image update -a hits-db
Hopefully this will fix the issue.
Failed:
Ok. cool. thanks for documenting. ๐
This might be a possible reason for the issue:
https://fly.io/docs/elixir/getting-started/#important-ipv6-settings
I don't know if there is a way to rebuild the Dockfile with fly. I've added the line manually to the file
similar error describe here: https://community.fly.io/t/postgres-failed-to-connect-to-proxy-context-deadline-exceeded/8141/14
I think the free tier has reached its limit for the database and has switch to read only. similar to https://community.fly.io/t/why-cant-i-restart-db-for-some-reason-it-does-not-work-although-the-status-says-running/9121
@nelsonic I think we might need to scale up the database for it work properly again (I think that's the issue)
fly checks list -a hits-db
Looks like we need to spend a bit of money and upgrade the Postgres DB. ๐ธ
Could you just check how much disk space it's using? ๐ญ
Cause I don't think it's the RAM
that's the issue ... ๐คทโโ๏ธ
I've added a comment on Fly: https://community.fly.io/t/postgres-failed-to-connect-to-proxy-context-deadline-exceeded/8141/17
Ok, what do we need to do next. Can we re-size the volume used to by the hits-db
instance so that PostgreSQL
has more space? ๐ญ
At the end I don't think it's a "scale" issue. This post as the same error: https://community.fly.io/t/failure-postgres-stopped-working-failed-to-connect-to-proxy-context-deadline-exceeded/5432 and it looks like the issue is linked to proxy on Fly.
If I haven't have any answer from https://github.com/dwyl/hits/issues/203#issuecomment-1339037344 soon I might write a newer post and hopefully get a reply
Hmmmm ... that's not great. Do you think that having the DB replicated across 2 (or more) Fly.io regions would mitigate the issue in future?
I'm not sure what is the recommended way for manging postgres app to make sure the data are always available, I need to research the documenatation (https://fly.io/docs/reference/postgres-on-nomad/#about-fly-postgres) and the community (https://community.fly.io/) to have a better understanding
@SimonLab looking forward to your conclusion. Happy to adopt any protocol you determine. ๐
Created a new topic: https://community.fly.io/t/postgres-unavailable-context-deadline-exceeded/9227
@SimonLab @nelsonic: Everything seems to be working now. ๐ฅ ๐ ๐
running fly checks list -a hits-db
checkDisk: 8.03 GB (82.1%) free space on /data/ (50.07ยตs)[โ]
Cool. looks like we need to increase this ASAP. โฌ๏ธ
@SimonLab you should have admin rights. ๐
Want to bump it to say 30GB
of space? ๐ธ
that should last us a few months ... โ
(82.1%) free space
I understand from these that there is a lot of available space still, no?
Yeah seems like the server has 50GB
. ๐
That should be more than "enough" for the foreseeable future. โณ
Closing. โ
Thanks again @SimonLab โค๏ธ
Thanks to all!!!
Hey,
Thanks for the great work you did on hits.
My hits counter is not working since the last 48 hours. Server issues?
Repo: https://github.com/C3n7ral051nt4g3ncy/Masto