[new Rodan prod] rodan-main fails and celery containers won't start

At first I thought this is an Nginx thing as in #1142, but starting Nginx manually inside the container I got [emerg] host not found in upstream and log wait-for-app: timeout occurred after waiting 15 seconds for iipsrv:9003. Checked again and I found that rodan-main would fail after several minutes (even when I set the docker container to be idle) and containers for celery jobs did not launch at all. This happens to both new VMs (with GPU and vGPU). My speculation is that celery and Nginx all depend on rodan-main, which is not working.

docker logs for rodan-main indicates that the container stops at this line.*

wait-for-app: waiting 15 seconds for postgres:5432
wait-for-app: postgres:5432 is available after 0 seconds
wait-for-app: waiting 15 seconds for redis:6379
wait-for-app: timeout occurred after waiting 15 seconds for redis:6379

Maybe it is related to the new OS and GPU, but I'm not sure. Need to investigate further and figure out the problem.

Updated all environment variables and now: In the rare case rodan-python3-celery did launch and terminated with following log message:

 mkdir -p /var/www
+ mkdir -p /code/Rodan/staticfiles
+ chmod -R a+rwx /rodan
+ chmod a+rwx /var
+ chmod a+rwx /code/Rodan/AUTHORS /code/Rodan/LICENSE /code/Rodan/__init__.py /code/Rodan/_clean_database.sh /code/Rodan/helper_scripts /code/Rodan/manage.py /code/Rodan/poetry.lock /code/Rodan/pyproject.toml /code/Rodan/readme.md /code/Rodan/requirements.txt /code/Rodan/rodan /code/Rodan/staticfiles /code/Rodan/websocket.ini
+ trap _term SIGTERM
+ cd /code/Rodan
+ /run/wait-for-app postgres:5432
wait-for-app: waiting 15 seconds for postgres:5432
wait-for-app: timeout occurred after waiting 15 seconds for postgres:5432

and rodan-main has this error msg:

wait-for-app: waiting 15 seconds for postgres:5432
wait-for-app: timeout occurred after waiting 15 seconds for postgres:5432
wait-for-app: waiting 15 seconds for redis:6379
wait-for-app: timeout occurred after waiting 15 seconds for redis:6379
+ mkdir -p /var/www
+ mkdir -p /code/Rodan/staticfiles
+ chmod -R a+rwx /rodan
+ chmod a+rwx /var
+ chmod a+rwx /code/Rodan/AUTHORS /code/Rodan/LICENSE /code/Rodan/__init__.py /code/Rodan/_clean_database.sh /code/Rodan/helper_scripts /code/Rodan/manage.py /code/Rodan/poetry.lock /code/Rodan/pyproject.toml /code/Rodan/readme.md /code/Rodan/requirements.txt /code/Rodan/rodan /code/Rodan/staticfiles /code/Rodan/websocket.ini
+ trap _term SIGTERM
+ cd /code/Rodan
+ /run/wait-for-app postgres:5432
wait-for-app: waiting 15 seconds for postgres:5432
wait-for-app: timeout occurred after waiting 15 seconds for postgres:5432

But postgres-plpython is healthy and is giving desired output. After restarting, rodan-main is giving the same log as above (*).

DDMAL / Rodan

[new Rodan prod] rodan-main fails and celery containers won't start #1145