Web not working, worker timeout

languagemaniac commented 10 months ago

Issue

Hi, I have an issue and no idea what caused it. It was working fine yesterday... I have this installed on my 64bit raspberry pi os on my Raspberry pi.

The database is apparently ok. Anyways I made a backup in SQL format as described in the docs.

I have no idea where to start. I deleted the web container and recreated it, but the issue persists.

Tandoor Version

Latest, can't see the number since the web is down

OS Version

Raspberry pi os bullseye

Setup

Docker / Docker-Compose

Reverse Proxy

No reverse proxy

Other

No response

Environment file

GNU nano 5.4                                                                           .env
# only set this to true when testing/debugging
# when unset: 1 (true) - dont unset this, just for development
DEBUG=0
SQL_DEBUG=0
DEBUG_TOOLBAR=0
# Gunicorn log level for debugging (default value is "info" when unset)
# (see https://docs.gunicorn.org/en/stable/settings.html#loglevel for available settings)
# GUNICORN_LOG_LEVEL="debug"

# HTTP port to bind to
TANDOOR_PORT=9000

# hosts the application can run under e.g. recipes.mydomain.com,cooking.mydomain.com,...
ALLOWED_HOSTS=*

# Cross Site Request Forgery protection
# (https://docs.djangoproject.com/en/4.2/ref/settings/#std-setting-CSRF_TRUSTED_ORIGINS)
# CSRF_TRUSTED_ORIGINS = []

# Cross Origin Resource Sharing
# (https://github.com/adamchainz/django-cors-header)
# CORS_ALLOW_ALL_ORIGINS = True

# random secret key, use for example `base64 /dev/urandom | head -c50` to generate one
# ---------------------------- AT LEAST ONE REQUIRED -------------------------
SECRET_KEY=key
SECRET_KEY_FILE=
# ---------------------------------------------------------------

# your default timezone See https://timezonedb.com/time-zones for a list of timezones
TIMEZONE=Europe/Berlin

# add only a database password if you want to run with the default postgres, otherwise change settings accordingly
DB_ENGINE=django.db.backends.postgresql
# DB_OPTIONS= {} # e.g. {"sslmode":"require"} to enable ssl
POSTGRES_HOST=db_recipes
POSTGRES_PORT=5432
POSTGRES_USER=djangouser
# ---------------------------- AT LEAST ONE REQUIRED -------------------------
POSTGRES_PASSWORD=key
POSTGRES_PASSWORD_FILE=
# ---------------------------------------------------------------
POSTGRES_DB=djangodb

# database connection string, when used overrides other database settings.
# format might vary depending on backend
# DATABASE_URL = engine://username:password@host:port/dbname

# the default value for the user preference 'fractions' (enable/disable fraction support)
# default: disabled=0
FRACTION_PREF_DEFAULT=0

# the default value for the user preference 'comments' (enable/disable commenting system)
# default comments enabled=1
COMMENT_PREF_DEFAULT=1

# Users can set a amount of time after which the shopping list is refreshed when they are in viewing mode
# This is the minimum interval users can set. Setting this to low will allow users to refresh very frequently which
# might cause high load on the server. (Technically they can obviously refresh as often as they want with their own scripts)
SHOPPING_MIN_AUTOSYNC_INTERVAL=5

# Default for user setting sticky navbar
# STICKY_NAV_PREF_DEFAULT=1

# If base URL is something other than just / (you are serving a subfolder in your proxy for instance http://recipe_app/recipes/)
# Be sure to not have a trailing slash: e.g. '/recipes' instead of '/recipes/'
# SCRIPT_NAME=/recipes

# If staticfiles are stored at a different location uncomment and change accordingly, MUST END IN /
# this is not required if you are just using a subfolder
# This can either be a relative path from the applications base path or the url of an external host
# STATIC_URL=/static/

# If mediafiles are stored at a different location uncomment and change accordingly, MUST END IN /
# this is not required if you are just using a subfolder
# This can either be a relative path from the applications base path or the url of an external host
# MEDIA_URL=/media/

# Serve mediafiles directly using gunicorn. Basically everyone recommends not doing this. Please use any of the examples
# provided that include an additional nxginx container to handle media file serving.
# If you know what you are doing turn this back on (1) to serve media files using djangos serve() method.
# when unset: 1 (true) - this is temporary until an appropriate amount of time has passed for everyone to migrate
GUNICORN_MEDIA=0

# GUNICORN SERVER RELATED SETTINGS (see https://docs.gunicorn.org/en/stable/design.html#how-many-workers for recommended settings)
# GUNICORN_WORKERS=1
# GUNICORN_THREADS=1

# S3 Media settings: store mediafiles in s3 or any compatible storage backend (e.g. minio)
# as long as S3_ACCESS_KEY is not set S3 features are disabled
# S3_ACCESS_KEY=
# S3_SECRET_ACCESS_KEY=
# S3_BUCKET_NAME=
# S3_REGION_NAME= # default none, set your region might be required
# S3_QUERYSTRING_AUTH=1 # default true, set to 0 to serve media from a public bucket without signed urls
# S3_QUERYSTRING_EXPIRE=3600 # number of seconds querystring are valid for
# S3_ENDPOINT_URL= # when using a custom endpoint like minio
# S3_CUSTOM_DOMAIN= # when using a CDN/proxy to S3 (see https://github.com/TandoorRecipes/recipes/issues/1943)

# Email Settings, see https://docs.djangoproject.com/en/3.2/ref/settings/#email-host
# Required for email confirmation and password reset (automatically activates if host is set)
# EMAIL_HOST=
# EMAIL_PORT=
# EMAIL_HOST_USER=
# EMAIL_HOST_PASSWORD=
# EMAIL_USE_TLS=0
# EMAIL_USE_SSL=0
# email sender address (default 'webmaster@localhost')
# DEFAULT_FROM_EMAIL=
# prefix used for account related emails (default "[Tandoor Recipes] ")
# ACCOUNT_EMAIL_SUBJECT_PREFIX=

# allow authentication via the REMOTE-USER header (can be used for e.g. authelia).
# ATTENTION: Leave off if you don't know what you are doing! Enabling this without proper configuration will enable anybody
#   to login with any username!
# See docs for additional information: https://docs.tandoor.dev/features/authentication/#reverse-proxy-authentication
# when unset: 0 (false)
REMOTE_USER_AUTH=0

# Default settings for spaces, apply per space and can be changed in the admin view
# SPACE_DEFAULT_MAX_RECIPES=0 # 0=unlimited recipes
# SPACE_DEFAULT_MAX_USERS=0 # 0=unlimited users per space
# SPACE_DEFAULT_MAX_FILES=0 # Maximum file storage for space in MB. 0 for unlimited, -1 to disable file upload.
# SPACE_DEFAULT_ALLOW_SHARING=1 # Allow users to share recipes with public links

# allow people to create local accounts on your application instance (without an invite link)
# social accounts will always be able to sign up
# when unset: 0 (false)
# ENABLE_SIGNUP=0

# If signup is enabled you might want to add a captcha to it to prevent spam
# HCAPTCHA_SITEKEY=
# HCAPTCHA_SECRET=

# if signup is enabled you might want to provide urls to data protection policies or terms and conditions
# TERMS_URL=
# PRIVACY_URL=
# IMPRINT_URL=

# enable serving of prometheus metrics under the /metrics path
# ATTENTION: view is not secured (as per the prometheus default way) so make sure to secure it
# trough your web server (or leave it open of you dont care if the stats are exposed)
# ENABLE_METRICS=0

# allows you to setup OAuth providers
# see docs for more information https://docs.tandoor.dev/features/authentication/
# SOCIAL_PROVIDERS = allauth.socialaccount.providers.github, allauth.socialaccount.providers.nextcloud,

# Should a newly created user from a social provider get assigned to the default space and given permission by default ?
# ATTENTION: This feature might be deprecated in favor of a space join and public viewing system in the future
# default 0 (false), when 1 (true) users will be assigned space and group
# SOCIAL_DEFAULT_ACCESS = 1

# if SOCIAL_DEFAULT_ACCESS is used, which group should be added
# SOCIAL_DEFAULT_GROUP=guest

# Django session cookie settings. Can be changed to allow a single django application to authenticate several applications
# when running under the same database
# SESSION_COOKIE_DOMAIN=.example.com
# SESSION_COOKIE_NAME=sessionid # use this only to not interfere with non unified django applications under the same top level domain

# by default SORT_TREE_BY_NAME is disabled this will store all Keywords and Food in the order they are created
# enabling this setting makes saving new keywords and foods very slow, which doesn't matter in most usecases.
# however, when doing large imports of recipes that will create new objects, can increase total run time by 10-15x
# Keywords and Food can be manually sorted by name in Admin
# This value can also be temporarily changed in Admin, it will revert the next time the application is started
# This will be fixed/changed in the future by changing the implementation or finding a better workaround for sorting
# SORT_TREE_BY_NAME=0
# LDAP authentication
# default 0 (false), when 1 (true) list of allowed users will be fetched from LDAP server
#LDAP_AUTH=
#AUTH_LDAP_SERVER_URI=
#AUTH_LDAP_BIND_DN=
#AUTH_LDAP_BIND_PASSWORD=
#AUTH_LDAP_USER_SEARCH_BASE_DN=
#AUTH_LDAP_TLS_CACERTFILE=
#AUTH_LDAP_START_TLS=

# Enables exporting PDF (see export docs)
# Disabled by default, uncomment to enable
# ENABLE_PDF_EXPORT=1

# Recipe exports are cached for a certain time by default, adjust time if needed
# EXPORT_FILE_CACHE_DURATION=600

Docker-Compose file

version: "3"
services:
  db_recipes:
    restart: always
    image: postgres:16.0-bullseye
    volumes:
      - ./postgresql:/var/lib/postgresql/data
    env_file:
      - ./.env

  web_recipes:
    restart: always
    image: vabene1111/recipes
    env_file:
      - ./.env
    volumes:
      - staticfiles:/opt/recipes/staticfiles
      - nginx_config:/opt/recipes/nginx/conf.d
      - ./mediafiles:/opt/recipes/mediafiles
      - ./externalfiles:/opt/recipes/externalfiles
    depends_on:
      - db_recipes

  nginx_recipes:
    image: nginx:latest
    restart: always
    ports:
      - 9000:80
    env_file:
      - ./.env
    depends_on:
      - web_recipes
    volumes:
      - nginx_config:/etc/nginx/conf.d:ro
      - staticfiles:/static:ro
      - ./mediafiles:/media:ro

volumes:
  nginx_config:
  staticfiles:

Relevant logs

[Checking configuration...
Waiting for database to be ready...
Database is ready
Migrating database
Operations to perform:
  Apply all migrations: account, admin, auth, authtoken, contenttypes, cookbook, oauth2_provider, sessions, sites, socialaccount
Running migrations:
  No migrations to apply.
  Your models in app(s): 'cookbook' have changes that are not yet reflected in a migration, and so won't be applied.
  Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
Generating static files
js-reverse file written to /opt/recipes/cookbook/static/django_js_reverse

1 static file copied to '/opt/recipes/staticfiles', 643 unmodified, 1375 post-processed.
Done
[2023-12-24 14:21:19 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2023-12-24 14:21:19 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
[2023-12-24 14:21:19 +0000] [1] [INFO] Using worker: gthread
[2023-12-24 14:21:20 +0000] [13] [INFO] Booting worker with pid: 13
[2023-12-24 14:21:20 +0000] [14] [INFO] Booting worker with pid: 14
[2023-12-24 14:21:20 +0000] [15] [INFO] Booting worker with pid: 15
[2023-12-24 14:21:50 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:13)
[2023-12-24 14:21:51 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:14)
[2023-12-24 14:21:51 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:15)
[2023-12-24 15:21:51 +0100] [13] [INFO] Worker exiting (pid: 13)
[2023-12-24 15:21:51 +0100] [15] [INFO] Worker exiting (pid: 15)
[2023-12-24 15:21:51 +0100] [14] [INFO] Worker exiting (pid: 14)
[2023-12-24 14:21:52 +0000] [1] [WARNING] Worker with pid 13 was terminated due to signal 9
[2023-12-24 14:21:52 +0000] [1] [WARNING] Worker with pid 15 was terminated due to signal 9
[2023-12-24 14:21:52 +0000] [1] [WARNING] Worker with pid 14 was terminated due to signal 9

languagemaniac commented 10 months ago

I have deleted all the containers including the database, and recreated everything. The issue persists.

Maybe I updated something and somehow it broke tandoor?

By the way, happy holidays! I don't expect an answer during these days but posting this here to keep it updated

smilerz commented 10 months ago

Comment out these two lines. I don't know if it's related, but I don't know what behavior is with a blank value set.

''' POSTGRES_PASSWORD_FILE= SECRET_KEY_FILE= '''

languagemaniac commented 10 months ago

Comment out these two lines. I don't know if it's related, but I don't know what behavior is with a blank value set.

''' POSTGRES_PASSWORD_FILE= SECRET_KEY_FILE= '''

I did it but the issue persists after rebooting.

smilerz commented 10 months ago

Changes won't take effect at reboot, you need to rebuild the container.

languagemaniac commented 10 months ago

Changes won't take effect at reboot, you need to rebuild the container.

Ok, i rebuilt but sadly the issue persists

languagemaniac commented 10 months ago

I think i might have found a solution

https://stackoverflow.com/questions/10855197/frequent-worker-timeout

We had the same problem using Django+nginx+gunicorn. From Gunicorn documentation we have configured the graceful-timeout that made almost no difference.

After some testings, we found the solution, the parameter to configure is: timeout (And not graceful timeout). It works like a clock..

So, Do:

1) open the gunicorn configuration file

2) set the TIMEOUT to what ever you need - the value is in seconds

NUM_WORKERS=3
TIMEOUT=120

exec gunicorn ${DJANGO_WSGI_MODULE}:application \
--name $NAME \
--workers $NUM_WORKERS \
--timeout $TIMEOUT \
--log-level=debug \
--bind=127.0.0.1:9000 \
--pid=$PIDFILE

But how can I access this file? I don't know where that is in my system, i looked up the version of gunicorn installed in my system, and the version doesn't match the one that the log outputs (the part where it says "Starting gunicorn 20.1.0")

So maybe the conf file is inside the container?

I tried searching for it but didn't find it

smilerz commented 10 months ago

The most likely cause of Worker with pid X was terminated due to signal 9 type errors are low memory conditions. You can try setting worker and thread counts.

But you'll need to troubleshoot memory conditions on the host machine to identify root cause.

languagemaniac commented 10 months ago

The most likely cause of Worker with pid X was terminated due to signal 9 type errors are low memory conditions. You can try setting worker and thread counts.

But you'll need to troubleshoot memory conditions on the host machine to identify root cause.

That link is a bit too difficult for me to understand. Could you guide me setting worker and thread counts?

I'm running this on a pi 3b+ and i guess it's because I only have 1Gb of ram and too many things installed. I'm saving to upgrade to something better.

languagemaniac commented 10 months ago

The most likely cause of Worker with pid X was terminated due to signal 9 type errors are low memory conditions. You can try setting worker and thread counts.

But you'll need to troubleshoot memory conditions on the host machine to identify root cause.

That link is a bit too difficult for me to understand. Could you guide me setting worker and thread counts?

I'm running this on a pi 3b+ and i guess it's because I only have 1Gb of ram and too many things installed. I'm saving to upgrade to something better.

Nevermind, I solved it. Now i have another issue. Will open a new issue to make it more clear.

TandoorRecipes / recipes