DiSSCo / SDR

Specimen Data Refinery
Apache License 2.0
7 stars 0 forks source link

Evaluate and test the SDR deployment document #133

Open llivermore opened 1 year ago

llivermore commented 1 year ago

Naturalis will set up their own instance of the SDR using current documentation. We will record (and respond) to any questions or feedback here.

TomDijkema commented 1 year ago

Hi Oliver/Laurence,

I am currently trying to run the SDR locally to test the implementation before setting it up on a remote server. This process until now gave me some issues that the documentation did not directly address but I could fix, I can list those if you like. However, I am now getting pretty stuck at the last step: the deployment of the Ansible playbook (I skipped the SSL portion for my local installment).

In an attempt to host the SDR locally I attempted several things but started with the suggested 'hosts' file were I defined the localhost ip as the host:

127.0.0.1 ansible_connection=local

This keeps giving me the following result:

ansible-playbook deploy-galaxy.yml
[DEPRECATION WARNING]: include is deprecated, use include_tasks/import_tasks instead. See 
https://docs.ansible.com/ansible-core/2.14/user_guide/playbooks_reuse_includes.html for details. This feature will be
 removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Could not match supplied host pattern, ignoring: galaxyservers
[WARNING]: Could not match supplied host pattern, ignoring: remoteservers

PLAY [galaxyservers,remoteservers] ***********************************************************************************
skipping: no hosts matched

PLAY RECAP ***********************************************************************************************************

Which I interpret as the host is not valid or I am probably doing something wrong in the way you should deploy this on localhost in comparison to a remote server. I also tried some other methods like the ones suggested in: https://gist.github.com/alces/caa3e7e5f46f9595f715f0f55eef65c1, and for example tried editing the hosts variable in the deploy-galaxy.yml file to localhost, but with the same result.

Could you give me a hint on what I am doing wrong? Ansible and Galaxy are completely new to me so I am probably overseeing something.

Best regards, Tom

OliverWoolland commented 1 year ago

Hello @TomDijkema! Thanks for trying out the process :) I have a few responses!

  1. I wouldn't personally try to use the Ansible scripts to deploy the SDR locally, the playbooks were not written with that in mind and it would be hard to undo the changes if any problems are found. I'd suggest setting up a virtual machine and deploying to that
  2. I suspect your specific issue here is missing the [remoteservers] tag in your hosts file? (see here)
  3. I have recently updated the deployment documentation in response to some feedback! It might be worth giving it another skim
TomDijkema commented 1 year ago

Hi @OliverWoolland,

Thanks, we have set up a remote server and now this part indeed works as expected in the manual. I am, however, encountering a new error which seems to inherit from the python code itself when executing the second playbook for the enhanced SDR features (the first playbook has finished successfully after some attempts :) ). Could you please have a look at the error message for me and let me know if this is something wrong in the code or if I need to change the configuration files? I'll leave you with the error message and configuration files.

Error:

TASK [Create bootstrap admin] ***************************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd: |-
    python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email 'dissco@sdr.com' --admin-user 'ubuntu' --server 'localhost'
  delta: '0:00:00.481564'
  end: '2022-12-09 14:12:34.882872'
  msg: non-zero return code
  rc: 1
  start: '2022-12-09 14:12:34.401308'
  stderr: |-
    Traceback (most recent call last):
      File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 20, in <module>
        gi = GalaxyInstance(url=args.server, key=args.master_api_key)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/__init__.py", line 83, in __init__
        super().__init__(url, key, email, password, verify=verify)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxyclient.py", line 67, in __init__
        raise ValueError(f"Missing scheme in url {url}")
    ValueError: Missing scheme in url localhost
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP **********************************************************************************
ip                : ok=14   changed=11   unreachable=0    failed=1    skipped=0    rescued=0    ignored=0    

Values in the encrypted sdr-secret.yml file:

api key: xiqnaejull
admin email: dissco@sdr.com
admin user: <user>
teklia:
bardecode:
ansible_ssh_pass: <server user password>
ansible_become_pass: <become pass>
vault_id_secret: <the vault password present in .vault-password.txt>
OliverWoolland commented 1 year ago

Hi @TomDijkema - excellent news that it has gone as planned. Please be a little careful posting IP addresses and passwords here!

Could you please double check you are working with the latest version of this repository? I would have expected this commit to solve your issue.

If you do have the latest version and that problem has persisted I will have to have a look in to it!

TomDijkema commented 1 year ago

Apologies, forgot to mute them, edited previous comment. I will check the version I pulled.

TomDijkema commented 1 year ago

Ok, I pulled the latest version and retried. It now encounters a 502 Bad Gateway, originating from the domain itself. If you navigate to https://sdr.dissco.tech (our domain for sdr) you can see the error code. This is the nginx result from running the first playbook, maybe something in there went wrong. Checked our server config but ports 80 and 443 should be accessible from anywhere, also included the SSL certificates in the server (called foo with the correct extension) and connection is verified.

I am not sure if this is a problem with our server config or SDR, it looks like ports 80 and 443 are up and running with nginx master just fine. If so I will check this with Sam next week.

Error in playbook:

TASK [Create bootstrap admin] ********************************************************************************************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd: |-
    python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email 'dissco@sdr.com' --admin-user '<user>' --server 'http://localhost'
  delta: '<delta>'
  end: '2022-12-09 15:18:47.793975'
  msg: non-zero return code
  rc: 1
  start: '2022-12-09 15:18:47.477817'
  stderr: |-
    Traceback (most recent call last):
      File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 27, in <module>
        for existing_user in gi.users.get_users():
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/users/__init__.py", line 74, in get_users
        return self._get(deleted=deleted, params=params)
      File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/client.py", line 134, in _get
        raise ConnectionError(
    bioblend.ConnectionError: GET: error 502: b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.18.0 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n', 0 attempts left: <html>
    <head><title>502 Bad Gateway</title></head>
    <body>
    <center><h1>502 Bad Gateway</h1></center>
    <hr><center>nginx/1.18.0 (Ubuntu)</center>
    </body>
    </html>
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP ***************************************************************************************************************************************************
ip                : ok=13   changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0 
OliverWoolland commented 1 year ago

Ok thanks for trying that, did you rerun only deploy-sdr.yml or did you run deploy-galaxy.yml as well?

If only deploy-sdr can I suggest rerunning deploy-galaxy first and seeing if that helps?

TomDijkema commented 1 year ago

I think i did rerun it, but let me try it again

OliverWoolland commented 1 year ago

Sadly the 502 is rather hard to debug. If needed, a procedure I have used is:

TomDijkema commented 1 year ago

Ok, when I manually start the Galaxy server as you described it gives the following logs:

supervisord is not running
supervisord is not running
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/gunicorn.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery-beat.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_0.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_1.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_2.log' for reading: No such file or directory
/usr/bin/tail: no files remaining
2022-12-09 16:09:42,001 WARN No file matches via include "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/*.conf"
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery-beat_celery-beat.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery_celery.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_gunicorn_gunicorn.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_0.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_1.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_2.conf" during parsing
Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.
For help, use /srv/galaxy/venv/bin/supervisord -h

I guess some other service is running on the port supervisord wants to run, on which port should it run according to your configuration?

OliverWoolland commented 1 year ago

Thanks for trying that :) I agree that looks like a problem! The set up uses standard ports so 80 or 443 I would expect

TomDijkema commented 1 year ago

Hmm, strange. Server states nginx running on the ports, but that does seem intentional no?

tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      23194/nginx: master 
tcp6       0      0 :::80                   :::*                    LISTEN      23194/nginx: master 
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      23194/nginx: master 
OliverWoolland commented 1 year ago

For that to be a problem seems very strange to me.

Maybe bring nginx down too? sudo systemctl stop nginx then restarting galaxyctl

OliverWoolland commented 1 year ago

You could also dump the nginx config to check that the galaxy configuration is there sudo nginx -T

TomDijkema commented 1 year ago

Galaxy is present in the nginx.conf, in the last portion of the file, it mentions this section is maintained by Ansible

TomDijkema commented 1 year ago

Anyway, thanks for your help until now! Will continue on Monday.

TomDijkema commented 1 year ago

Hi @OliverWoolland ,

After some inspecting we think we have narrowed down the problem to a missing instance of Gunicorn, which already creates some errors while running the first playbook (the Galaxy one). On Friday, when I ran the playbook twice it somehow ignored this error which probably was not beneficial for the second playbook. It seems Galaxy runs on Gunicorn, thus it can not be missed. One of my previous comments when I ran the Galaxy instance solely by itself gave a similar issue calling out supervisord was not running.

Here is the error the first playbook gives:

RUNNING HANDLER [galaxyproject.galaxy : galaxy gravity restart] *************************************************
fatal: [ip]: FAILED! => changed=true 
  cmd:
  - /srv/galaxy/venv/bin/galaxyctl
  - graceful
  delta: '0:00:00.535848'
  end: '2022-12-12 08:13:37.619923'
  msg: non-zero return code
  rc: 1
  start: '2022-12-12 08:13:37.084075'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |-
    gunicorn: ERROR (not running)
    gunicorn: ERROR (no such file)
  stdout_lines: <omitted>

PLAY RECAP ******************************************************************************************************
ip                : ok=112  changed=6    unreachable=0    failed=1    skipped=63   rescued=0    ignored=0 
OliverWoolland commented 1 year ago

Hi @TomDijkema, I hope you had a nice weekend. The playbook should handle the creation and setup of the Gunicorn instance. I wonder if this could be linked to the playbook having run with an old version first.

If you've not tried it already, could you try running both playbooks again on a fresh (Ubuntu 20.04) VM? After the first playbook runs you should be able to find a (blank) instance of Galaxy running at your URL)

TomDijkema commented 1 year ago

The weekend was great! Hopefully yours was too.

Ok, I shall try and set up a new VM to install the SDR to ensure we have a clean instance.

TomDijkema commented 1 year ago

Just to be sure: the ansible_ssh_pass variable in the secrets file, it needs to contain the Remote machine ssh user password. What is your exact definition of this (because I am not 100% sure)?

TomDijkema commented 1 year ago

Hi @OliverWoolland,

Here are the notes I took during the set up of the SDR. Tried to summarize them, so let me know if I need to further clarify something.

Overview The setup of the SDR is in principle not very difficult, but does require attention to detail and some advanced knowledge about server management, using the bash and setting up SSL certificates. The biggest hurdle we came across was wanting to go too fast which led to some complications in the deployment. After contact with Oliver Woolland, who was very responsive and helpful, we managed to fix the issues pretty quickly and get it to work.

Notes The current documentation on the setup of the SDR is short, but touches all the necessary topics. There could be made some improvements. We state our suggestions in the bullet points bellow: