chaosdorf / chaospizza

🍕😎
MIT License
9 stars 2 forks source link

If SENTRY_DSN isn't properly configured, the app is stuck in a loop. #13

Open YtvwlD opened 6 years ago

YtvwlD commented 6 years ago

It just prints ** Database is unavailable - sleeping over and over again with no proper exception message.

Removing the >/dev/null 2>&1 in src/wait-for-db.sh reveals an error message.

msteinhoff commented 6 years ago

Why don't you configure sentry properly then? :^)

YtvwlD commented 6 years ago

A proper error message would still be nice. :3

msteinhoff commented 6 years ago

Root-cause analysis:

Chaospizza is written in Django because Django is awesome and all the cool kids were using it (and I needed a sandbox project to learn Django).

The Django framework does not care about the web server part in your application. It does not care where web requests come from, are handled, etc, it just implements WSGI. WSGI dictates that a web-accessible application has to provide a python module that contains this function signature: def application(environ, start_response) (guess what config/wsgi.py does) and then you pass that to the web server, e.g. line 18 in run.sh:

exec gunicorn config.wsgi:application

This is really awesome because it makes the web application a shared-nothing architecture with failure isolation. We use gunicorn for the web server part (responsible for listening on a socket, accepting connections, HTTP protocol handling, etc), and gunicorn by default starts a number of worker processes, each with its own python interpreter, and dispatches requests to all of them. A bug in the application code may crash such a worker process, but never the gunicorn process.

Django usually stores data in a database and needs a schema to work correctly. And the database schema may change during the lifecycle of an application. When the web server is started, Django expects the database to be online and to have the correct schema. So before starting the web server the initial schema must already have been created. If the schema has already been created but there are pending schema migrations, those must have been applied too. Ideally, those tasks should run automatically to free humans from operational burden. We need a way to run one-off initialization code when the application starts.

WSGI leaves all process architecture details to the web server implementation and has no concept for initialization hooks before the Django application starts. There is no place where you can put a call to run the schema migrations. Of course it's possible to e.g. put code in the wsgi module and setup "global" state there, but you are lying to yourself because in case e.g. gunicorn the state is not global-global but only worker-process-global, and you have no control over the worker-process lifecycle from the application perspective. There are ways to do this but this would require app-server specific code, but uhm nope.

So you just have to fucking duct-tape this python manage.py migrate command somewhere. And its 2018 and in container world it might be possible that the django container has been started before the database is available and if you naively put the migrate command in run.sh and start the django container before the database is available, it will just complain, crash and die.

Enter wait-for-db.sh and run.sh: wait until the database is available, then run one-off init code, finally run the actual gunicorn process.

The closest thing from Django that can be used as a database health check is the inspectdb command, it hits the database using credentials found in DJANGO_DATABASE_URL and does not modify anything. I put the >/dev/null 2>&1 in there to avoid spamming the log with stacktraces (when the DB is down it is down, 20+ line stacktraces won't help). But this creates a false-positive when anything else in the django config is broken.

Like I don't know, a missing environment variable.

Two things we can fix here:

  1. Make SENTRY_DSN optional in production mode (already done on my machine)
  2. Replace django inspectdb with custom checkdb command that hits the database, ignores DB exceptions, and returns reachable or notreachable on stdout and exit code zero.
YtvwlD commented 6 years ago

Make SENTRY_DSN optional in production mode (already done on my machine).

Or simply check if the variable is set in wait-for-db.sh?

msteinhoff commented 6 years ago

Or simply check if the variable is set in wait-for-db.sh?

This also works, but IMHO its cleaner to do in settings/prod.py:

RAVEN_CONFIG = {
    'dsn': env('SENTRY_DSN',default="")
}
YtvwlD commented 6 years ago

Yeah, that would be better.

But perhaps show a warning, if the environment is production and SENTRY_DSN isn't set?

msteinhoff commented 6 years ago

Should sentry be optional in production?

msteinhoff commented 6 years ago

@YtvwlD ping

YtvwlD commented 6 years ago

ping

pong

I would like to use (and keep using) Sentry in production.

But perhaps other people would deploy this somewhere else without.

I'm for making this optional, but displaying a warning.