accessibility-exchange / platform

The Accessibility Exchange platform.
https://github.com/orgs/accessibility-exchange/projects/2/views/8
BSD 3-Clause "New" or "Revised" License
4 stars 10 forks source link

SQLSTATE[HY000] [2002] Connection refused (SQL: select * from `sessions` where `id` = w67nFAnxed077E... #1602

Closed jobara closed 1 year ago

jobara commented 1 year ago

Illuminate\Database\QueryException

SQLSTATE[HY000] [2002] Connection refused (SQL: select from sessions where id = w67nFAnxed077E2B750CHNiVwCdXJdO5xgGembsH limit 1/controller='RedirectController',action='__invoke',url='%2F',route='generated%3A%3AlUGnUf3YYG6WkkwN'*/;)

:dart: Illuminate\Database\Connection::runQueryCallback :page_facing_up: /app/vendor/laravel/framework/src/Illuminate/Database/Connection.php

758  // lot more helpful to the developer instead of just the database's errors.
759  catch (Exception $e) {
760*     throw new QueryException(
761          $query, $this->prepareBindings($bindings), $e
762      );

Open in Flare

marvinroman commented 1 year ago

I went through both the pod and cluster stats for the time period the error occurred. There was a brief but small Memory usage spike and iops drop on the cluster at that time. It really shouldn't have caused issues, but might have with the MySQL connection timeout being kinda low.

I would suggest we do 2 things:

  1. Create a pod healthcheck that doesn't hit MySQL.
  2. Bump up connect_timeout to 30.
jobara commented 1 year ago

@marvinroman any thoughts as to what caused the spike at that time? Also regarding the health check, I think we'd still want a way to make sure the database is still running properly. Is there another way to check this?

gtirloni commented 1 year ago

Since it was a connection refused issue, I think increasing the connection timeout value might not have the desired effect.

Also, if the app opens and closes the DB then, for a transient issue, it's gone by itself... so I was going to suggest we adjust the connection retry but that might not be necessary (I don't know if we're doing any kind of DB pooling anyway).

If we create a healthcheck endpoint that doesn't hit MySQL, please let's use it just for the k8s checks. We still need something more on the functional side to indicate if the app as a whole is working.

marvinroman commented 1 year ago

I will do a PR with the following options for health checks:

marvinroman commented 1 year ago

To clarify. These suggestions are only meant for internal K8 probes and not external probes.

@gtirloni I think you are right that connection timeouts probably won't help in this instance.

flare-error-tracker[bot] commented 1 year ago

Because you closed this issue the following error in Flare was marked as resolved: SQLSTATE[HY000] [2002] Connection timed out (SQL: select * from sessions where id = UeHhJaiJLHiPxaxv4PETP0cTOqXbUgKl440jWIJt limit 1)