Avans-ATGM / infrastructure

ATGM Infrastructure Repository
0 stars 2 forks source link

Spider: 502 Bad Gateway #4

Closed Dirowa closed 1 year ago

Dirowa commented 3 years ago

currently when going on the website error 502 bad gateway is being produced by ngnx

Currently still investigating what is going wrong

sudo systemctl status galaxy produces the following error message:

(base) bioinf_team@galaxy://$ sudo systemctl status galaxy
● galaxy.service - Galaxy
     Loaded: loaded (/etc/systemd/system/galaxy.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-11-17 13:55:35 CET; 2h 57min ago
   Main PID: 3687353 (code=exited, status=1/FAILURE)
        CPU: 38ms

nov 17 13:55:35 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Scheduled restart job, restart counter is at 5.
nov 17 13:55:35 galaxy.bioinformatics-atgm.nl systemd[1]: Stopped Galaxy.
nov 17 13:55:35 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Start request repeated too quickly.
nov 17 13:55:35 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Failed with result 'exit-code'.
nov 17 13:55:35 galaxy.bioinformatics-atgm.nl systemd[1]: Failed to start Galaxy.

So far looking into the ngnix roles:

it is on my to do list to figure out where it is going wrong

hexylena commented 3 years ago

currently when going on the website error 502 bad gateway is being produced by ngnx

That's not anything to do with nginx, it's entirely due to the backend being down (nginx is trying to contact a gateway but it's bad). Clearly due to galaxy being down.

Looking at the galaxy logs

root@galaxy:/srv/galaxy/server# journalctl -u galaxy  -f
-- Logs begin at Wed 2021-11-17 04:12:38 CET. --
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl uwsgi[3932467]: lock engine: pthread robust mutexes
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl uwsgi[3932467]: thunder lock: enabled
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl uwsgi[3932467]: bind(): Permission denied [core/socket.c line 769]
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Main process exited, code=exited, status=1/FAILURE
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Failed with result 'exit-code'.
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Scheduled restart job, restart counter is at 5.
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: Stopped Galaxy.
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Start request repeated too quickly.
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: galaxy.service: Failed with result 'exit-code'.
nov 17 20:15:16 galaxy.bioinformatics-atgm.nl systemd[1]: Failed to start Galaxy.

Permission denied

81 is a really bad thing! It should be a high port number (>1024), port 81 is a privileged port, and unless the process is running as root, it cannot bind to it.

Also the next problem will be that they both listen on the same port!

root@galaxy:/srv/galaxy/server# grep socket /srv/*galaxy/config/galaxy.yml
/srv/galaxy/config/galaxy.yml:    socket: 127.0.0.1:81
/srv/spider-galaxy/config/galaxy.yml:    socket: 127.0.0.1:81

And nginx cannot connect to anything there ALSO because it's trying to connect to the wrong port

location / {
›   # This is the backend to send the requ
›   uwsgi_pass 127.0.0.1:8080;
›   uwsgi_param UWSGI_SCHEME $scheme;
›   include uwsgi_params;
}

that definitely needs to be a diff port.

Dirowa commented 3 years ago

i will be on it to change it. Thank you for your help!

hexylena commented 3 years ago

I'm halfway done, I'll fix it this time if it's OK for you? Sorry for the interruption

hexylena commented 3 years ago

just don't want to waste your time!

Dirowa commented 3 years ago

sure go ahead!

hexylena commented 3 years ago

The SSL issue is separate, but you'll see that

nginx_servers:
  - galaxy
  - spider-galaxy

are used, rather than nginx_ssl_servers, so, I'll fix that separately, I'm trying to get proper certificates for the servers.

when checking the certificates. ansible.com.crt do not exist but ssl-cert-snakeoil.key does (same for certs). Do this needs to be edited?

That's going to take more time to figure out :( probably something stupid I did.

edited nginx_ssl_role: in all.yml from self-signed-cert to galaxyproject.self_signed_certs due to error in playbook.

awesome, good catch!

The other important change is that I've moved spider galaxy to /spider so they won't conflict there either

Dirowa commented 3 years ago

ah okay i see some of the errors and why they occourd. what is the next plan of action?

hexylena commented 3 years ago

You:

Me:

Dirowa commented 1 year ago

stale subjet not really longer needed. Can be pickedup once the ports are set open by DIF team