Open llivermore opened 1 year ago
Hi Oliver/Laurence,
I am currently trying to run the SDR locally to test the implementation before setting it up on a remote server. This process until now gave me some issues that the documentation did not directly address but I could fix, I can list those if you like. However, I am now getting pretty stuck at the last step: the deployment of the Ansible playbook (I skipped the SSL portion for my local installment).
In an attempt to host the SDR locally I attempted several things but started with the suggested 'hosts' file were I defined the localhost ip as the host:
127.0.0.1 ansible_connection=local
This keeps giving me the following result:
ansible-playbook deploy-galaxy.yml
[DEPRECATION WARNING]: include is deprecated, use include_tasks/import_tasks instead. See
https://docs.ansible.com/ansible-core/2.14/user_guide/playbooks_reuse_includes.html for details. This feature will be
removed in version 2.16. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Could not match supplied host pattern, ignoring: galaxyservers
[WARNING]: Could not match supplied host pattern, ignoring: remoteservers
PLAY [galaxyservers,remoteservers] ***********************************************************************************
skipping: no hosts matched
PLAY RECAP ***********************************************************************************************************
Which I interpret as the host is not valid or I am probably doing something wrong in the way you should deploy this on localhost in comparison to a remote server. I also tried some other methods like the ones suggested in: https://gist.github.com/alces/caa3e7e5f46f9595f715f0f55eef65c1, and for example tried editing the hosts variable in the deploy-galaxy.yml file to localhost, but with the same result.
Could you give me a hint on what I am doing wrong? Ansible and Galaxy are completely new to me so I am probably overseeing something.
Best regards, Tom
Hello @TomDijkema! Thanks for trying out the process :) I have a few responses!
[remoteservers]
tag in your hosts file? (see here)Hi @OliverWoolland,
Thanks, we have set up a remote server and now this part indeed works as expected in the manual. I am, however, encountering a new error which seems to inherit from the python code itself when executing the second playbook for the enhanced SDR features (the first playbook has finished successfully after some attempts :) ). Could you please have a look at the error message for me and let me know if this is something wrong in the code or if I need to change the configuration files? I'll leave you with the error message and configuration files.
Error:
TASK [Create bootstrap admin] ***************************************************************
fatal: [ip]: FAILED! => changed=true
cmd: |-
python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email 'dissco@sdr.com' --admin-user 'ubuntu' --server 'localhost'
delta: '0:00:00.481564'
end: '2022-12-09 14:12:34.882872'
msg: non-zero return code
rc: 1
start: '2022-12-09 14:12:34.401308'
stderr: |-
Traceback (most recent call last):
File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 20, in <module>
gi = GalaxyInstance(url=args.server, key=args.master_api_key)
File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/__init__.py", line 83, in __init__
super().__init__(url, key, email, password, verify=verify)
File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxyclient.py", line 67, in __init__
raise ValueError(f"Missing scheme in url {url}")
ValueError: Missing scheme in url localhost
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
PLAY RECAP **********************************************************************************
ip : ok=14 changed=11 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Values in the encrypted sdr-secret.yml file:
api key: xiqnaejull
admin email: dissco@sdr.com
admin user: <user>
teklia:
bardecode:
ansible_ssh_pass: <server user password>
ansible_become_pass: <become pass>
vault_id_secret: <the vault password present in .vault-password.txt>
Hi @TomDijkema - excellent news that it has gone as planned. Please be a little careful posting IP addresses and passwords here!
Could you please double check you are working with the latest version of this repository? I would have expected this commit to solve your issue.
If you do have the latest version and that problem has persisted I will have to have a look in to it!
Apologies, forgot to mute them, edited previous comment. I will check the version I pulled.
Ok, I pulled the latest version and retried. It now encounters a 502 Bad Gateway, originating from the domain itself. If you navigate to https://sdr.dissco.tech (our domain for sdr) you can see the error code. This is the nginx result from running the first playbook, maybe something in there went wrong. Checked our server config but ports 80 and 443 should be accessible from anywhere, also included the SSL certificates in the server (called foo with the correct extension) and connection is verified.
I am not sure if this is a problem with our server config or SDR, it looks like ports 80 and 443 are up and running with nginx master just fine. If so I will check this with Sam next week.
Error in playbook:
TASK [Create bootstrap admin] ********************************************************************************************************************************
fatal: [ip]: FAILED! => changed=true
cmd: |-
python3 '/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py' --master-api-key 'xiqnaejull' --admin-email 'dissco@sdr.com' --admin-user '<user>' --server 'http://localhost'
delta: '<delta>'
end: '2022-12-09 15:18:47.793975'
msg: non-zero return code
rc: 1
start: '2022-12-09 15:18:47.477817'
stderr: |-
Traceback (most recent call last):
File "/srv/SDR/deployment/galaxy-scripts/add-bootstrap-admin.py", line 27, in <module>
for existing_user in gi.users.get_users():
File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/users/__init__.py", line 74, in get_users
return self._get(deleted=deleted, params=params)
File "/usr/local/lib/python3.10/dist-packages/bioblend/galaxy/client.py", line 134, in _get
raise ConnectionError(
bioblend.ConnectionError: GET: error 502: b'<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.18.0 (Ubuntu)</center>\r\n</body>\r\n</html>\r\n', 0 attempts left: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.18.0 (Ubuntu)</center>
</body>
</html>
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
PLAY RECAP ***************************************************************************************************************************************************
ip : ok=13 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0
Ok thanks for trying that, did you rerun only deploy-sdr.yml or did you run deploy-galaxy.yml as well?
If only deploy-sdr can I suggest rerunning deploy-galaxy first and seeing if that helps?
I think i did rerun it, but let me try it again
Sadly the 502 is rather hard to debug. If needed, a procedure I have used is:
sudo systemctl stop galaxy
sudo su galaxy
cd /srv/galaxy/
source venv/bin/activate
/srv/galaxy/venv/bin/galaxyctl start --foreground
Ok, when I manually start the Galaxy server as you described it gives the following logs:
supervisord is not running
supervisord is not running
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/gunicorn.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/celery-beat.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_0.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_1.log' for reading: No such file or directory
/usr/bin/tail: cannot open '/srv/galaxy/var/gravity/log/handler_2.log' for reading: No such file or directory
/usr/bin/tail: no files remaining
2022-12-09 16:09:42,001 WARN No file matches via include "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/*.conf"
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery-beat_celery-beat.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_celery_celery.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_gunicorn_gunicorn.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_0.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_1.conf" during parsing
2022-12-09 16:09:42,001 INFO Included extra file "/srv/galaxy/var/gravity/supervisor/supervisord.conf.d/_default_.d/galaxy_standalone_handler_2.conf" during parsing
Error: Another program is already listening on a port that one of our HTTP servers is configured to use. Shut this program down first before starting supervisord.
For help, use /srv/galaxy/venv/bin/supervisord -h
I guess some other service is running on the port supervisord wants to run, on which port should it run according to your configuration?
Thanks for trying that :) I agree that looks like a problem! The set up uses standard ports so 80 or 443 I would expect
Hmm, strange. Server states nginx running on the ports, but that does seem intentional no?
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 23194/nginx: master
tcp6 0 0 :::80 :::* LISTEN 23194/nginx: master
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 23194/nginx: master
For that to be a problem seems very strange to me.
Maybe bring nginx down too? sudo systemctl stop nginx
then restarting galaxyctl
You could also dump the nginx config to check that the galaxy configuration is there sudo nginx -T
Galaxy is present in the nginx.conf, in the last portion of the file, it mentions this section is maintained by Ansible
Anyway, thanks for your help until now! Will continue on Monday.
Hi @OliverWoolland ,
After some inspecting we think we have narrowed down the problem to a missing instance of Gunicorn, which already creates some errors while running the first playbook (the Galaxy one). On Friday, when I ran the playbook twice it somehow ignored this error which probably was not beneficial for the second playbook. It seems Galaxy runs on Gunicorn, thus it can not be missed. One of my previous comments when I ran the Galaxy instance solely by itself gave a similar issue calling out supervisord was not running.
Here is the error the first playbook gives:
RUNNING HANDLER [galaxyproject.galaxy : galaxy gravity restart] *************************************************
fatal: [ip]: FAILED! => changed=true
cmd:
- /srv/galaxy/venv/bin/galaxyctl
- graceful
delta: '0:00:00.535848'
end: '2022-12-12 08:13:37.619923'
msg: non-zero return code
rc: 1
start: '2022-12-12 08:13:37.084075'
stderr: ''
stderr_lines: <omitted>
stdout: |-
gunicorn: ERROR (not running)
gunicorn: ERROR (no such file)
stdout_lines: <omitted>
PLAY RECAP ******************************************************************************************************
ip : ok=112 changed=6 unreachable=0 failed=1 skipped=63 rescued=0 ignored=0
Hi @TomDijkema, I hope you had a nice weekend. The playbook should handle the creation and setup of the Gunicorn instance. I wonder if this could be linked to the playbook having run with an old version first.
If you've not tried it already, could you try running both playbooks again on a fresh (Ubuntu 20.04) VM? After the first playbook runs you should be able to find a (blank) instance of Galaxy running at your URL)
The weekend was great! Hopefully yours was too.
Ok, I shall try and set up a new VM to install the SDR to ensure we have a clean instance.
Just to be sure: the ansible_ssh_pass variable in the secrets file, it needs to contain the Remote machine ssh user password. What is your exact definition of this (because I am not 100% sure)?
Hi @OliverWoolland,
Here are the notes I took during the set up of the SDR. Tried to summarize them, so let me know if I need to further clarify something.
Overview The setup of the SDR is in principle not very difficult, but does require attention to detail and some advanced knowledge about server management, using the bash and setting up SSL certificates. The biggest hurdle we came across was wanting to go too fast which led to some complications in the deployment. After contact with Oliver Woolland, who was very responsive and helpful, we managed to fix the issues pretty quickly and get it to work.
Notes The current documentation on the setup of the SDR is short, but touches all the necessary topics. There could be made some improvements. We state our suggestions in the bullet points bellow:
A bit more context about the function of the host and remote machine and how Ansible is used to deploy the SDR from a local system to a remote one (reference to the: Ansible for the SDR page). For example: at the start, we were not sure if to pull the repository to the local system or to the server environment (lack of Ansible knowledge).
Probably best to list or use bullet points for the requirements of the host machine as well. At first we oversaw these requirements (Ansible (>= 2.12), sshpass, pip installed)
Would recommend to list the commands for the secret parameters one at a time and conclude with the full example. Now they are stated above the explanation which can lead to the user just plainly executing these commands all at once. Would be nice if you could say for example: now, insert the required parameters as listed below (list of secret parameters) by opening the file with this command (nano command).
Would also move the reference to the secret parameters to the top of the paragraph.
The secret parameters reference is good, but we think it can include a little bit more context per parameter, what is Teklia / Bardecode (reference to the: SDR tools technical page), a bit broader description on ansile_ssh_pass and ansible_become_pass.
The vault id of course references to the random string that was generated with the first command (openssl rand -base64 24 > .vault-password.txt), probably best to add to the description to copy the value from the .vault-password.txt file to the vault id in the secrets file. The vault id is also called a secret as well as a password (kinda the same) but may be confusing.
Please state where to create the hosts file (in the root directory).
At the creation of the hosts file it is stated an ip is acceptable to define the remote server, this will however result in a nonfunctional data upload function because the SSL certificate can not relate to the ip, but should be related to a domain.
Mention default examples of the ssh_user like ubuntu on Ubuntu for example.
Generation of SSL certificates requires, of course, some technical knowledge. This may be a hick-up for inexperienced users trying to set up the SDR. However, it may be questioned if the SDR will ever be set up by these kinds of users or is always deployed by a technical team.
Please state the name of the certificate files should in fact be ‘foo’. It seems a bit like you have to define a name yourself since foo. Maybe also good to mention the name can be changed, but this also requires the user to change the reference name in the recurring file (did see it was defined somewhere)
When deploying the first Ansible playbook, please mention that the user should disable nginx if installed by using: service nginx stop. Otherwise it will somehow conflict with the Galaxy configuration and display the 502 error. It by the way always displays the 502 error after deploying, but after like ten seconds a refresh will show the Galaxy page (nginx probably takes some time).
The first Ansible playbook can fail the first time it is run, but a second try can do the trick.
Naturalis will set up their own instance of the SDR using current documentation. We will record (and respond) to any questions or feedback here.