galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

TODOs from Barcelona #1809

Closed natefoo closed 1 year ago

natefoo commented 4 years ago

An issue for collecting things we notice during the 2020 Galaxy Admin Training in Barcelona that need to be fixed

Not admin-related:

lldelisle commented 4 years ago

in Galaxy installation with Ansible tutorial the part Galaxy is now configured with an admin user, a database, and a place to store data. Additionally we’ve immediately configured the mules for production Galaxy serving. So we’re ready to set up supervisord which will manage the Galaxy processes!

hands_on Hands-on: (Optional) Launching uWSGI by hand

    SSH into your server
    Switch user to Galaxy account (sudo -iu galaxy)
    Change directory into /srv/galaxy/server
    Activate virtualenv (. ../venv/bin/activate)
    uwsgi --yaml ../config/galaxy.yml
    Access at port <ip address>:8080 once the server has started

is duplicated.

natefoo commented 4 years ago

@lldelisle thanks!

nsoranzo commented 4 years ago

@lldelisle That was fixed already in #1810

hexylena commented 4 years ago

validate job xml etc against the definition

lldelisle commented 4 years ago

In: https://training.galaxyproject.org/training-material/topics/admin/tutorials/connect-to-compute-cluster/tutorial.html#a-dynamic-destination Use different name for the group id

natefoo commented 4 years ago

@lldelisle thanks, we added this as "Stop re-using IDs between sections (aka don't use the same values for runner IDs, destination IDs, job resource IDs, etc."

hexylena commented 4 years ago

Writing in my own comment, lest any updates conflict or be ovewritten

lldelisle commented 4 years ago

typo in https://galaxyproject.github.io//training-material/topics/admin/tutorials/pulsar/tutorial.html#testing-pulsar journalctcl -fu galaxy instead of journalctl -fu galaxy

nsoranzo commented 4 years ago

typo in https://galaxyproject.github.io//training-material/topics/admin/tutorials/pulsar/tutorial.html#testing-pulsar journalctcl -fu galaxy instead of journalctl -fu galaxy

Thanks, will be fixed by https://github.com/galaxyproject/training-material/pull/1822

ondrejme commented 4 years ago

Connect to compute Citing from the hands-on tutorial:

if the folder does not exist, create files/galaxy/config next to your playbook.yml (mkdir -p files/galaxy/config/)

The playbook name should probably change to galaxy.yml, since other tutorials reference it.

hexylena commented 4 years ago

Thanks @ondrejme!

lldelisle commented 4 years ago

change the short help of local gxadmins: https://training.galaxyproject.org/training-material/topics/admin/tutorials/gxadmin/tutorial.html local_hello() { ## hello: Says hi -> local_hello() { ## : Says hi

local_query-latest() { ## query-latest [jobs|10]: Queries latest N jobs (default to 10) -> local_query-latest() { ## [jobs|10]: Queries latest N jobs (default to 10)

nsoranzo commented 4 years ago

"Invalid username or password" when grafana starts, maybe due to: grafana_url: "https:///grafana/" in https://training.galaxyproject.org/training-material/topics/admin/tutorials/monitoring/tutorial.html

nsoranzo commented 4 years ago

Connect to compute Citing from the hands-on tutorial:

if the folder does not exist, create files/galaxy/config next to your playbook.yml (mkdir -p files/galaxy/config/)

The playbook name should probably change to galaxy.yml, since other tutorials reference it.

@ondrejme Thanks, it will be addressed by https://github.com/galaxyproject/training-material/pull/1829 .

lldelisle commented 4 years ago

In https://training.galaxyproject.org/training-material/topics/admin/tutorials/ansible-galaxy/tutorial.html#postgresql At the beginning of the tutorial (when setting postgres) we had in group_vars/galaxyservers.yml

# Python 3 support
pip_virtualenv_command: /usr/bin/python3 -m virtualenv # usegalaxy_eu.certbot, usegalaxy_eu.tiaas2, galaxyproject.galaxy
certbot_virtualenv_package_name: python3-virtualenv    # usegalaxy_eu.certbot
pip_package: python3-pip                               # geerlingguy.pip

Then when we set galaxy_config and uwsgi the solution shows something which begins by:

# python3 support
pip_virtualenv_command: virtualenv

I guess this is not expected...

lldelisle commented 4 years ago

In the same solution, it is written: galaxy_user: {name: galaxy, shell: /bin/bash, home: "{{ galaxy_root }}"}

Whereas in the table above it is written: {name: galaxy, shell: /bin/bash}

hexylena commented 4 years ago

home: "{{ galaxy_root }}"}

Wow, @lldelisle you found it. It looks like I added it, a long time ago. I really don't know how that happened. Ok, amazing, thank you. We will make sure those snippets are in sync in the future.

lldelisle commented 4 years ago

I found a journalctf -u galaxy -f instead of journalctl -u galaxy -f in https://training.galaxyproject.org/training-material/topics/admin/tutorials/tiaas/tutorial.html#setting-up-tiaas

nsoranzo commented 4 years ago

I found a journalctf -u galaxy -f instead of journalctl -u galaxy -f in https://training.galaxyproject.org/training-material/topics/admin/tutorials/tiaas/tutorial.html#setting-up-tiaas

Fixed already in https://github.com/galaxyproject/training-material/pull/1836 , thanks for reporting anyway!

hexylena commented 4 years ago

gxit - leading spaces in paste

nsoranzo commented 4 years ago

gxit - leading spaces in paste

https://github.com/galaxyproject/training-material/pull/1842

ondrejme commented 4 years ago

Hands-on: Enabling Interactive Tools in Galaxy Step3: I would suggest changing order if "id" and "destination" in tag, as it is with other tool-destinations mappings

Step4:
interactivetools_enable: "True" remove quotation marks and make the capital letter small

lldelisle commented 4 years ago

in https://training.galaxyproject.org/training-material/topics/admin/tutorials/ansible-galaxy/tutorial.html If you want not to use ssl, I guess you also need to change the templates/nginx/galaxy.j2 because:

    # Listen on port 443
    listen        *:443 ssl default_server;

Will not work, right?

natefoo commented 4 years ago

@lldelisle If you changed this to listen *:80 default_server;, you should also move this template from nginx_ssl_servers to nginx_servers, remove redirect-ssl from nginx_servers, and comment nginx_ssl_role. You would also need to remove /etc/nginx/sites-enabled/redirect-ssl. You could do this with a pre_task like:

- name: Remove redirect-ssl config
  file:
    path: /etc/nginx/sites-enabled/redirect-ssl
    state: absent
lldelisle commented 4 years ago

Many thanks... So the only think which is missing in the training material is: change

    # Listen on port 443
    listen        *:443 ssl default_server;

to

    # Listen on port 80
    listen        *:80 default_server;

If you ran the playbook once with redirect-ssl before deciding to do not use SSL, remove the file /etc/nginx/sites-enabled/redirect-ssl.

lldelisle commented 4 years ago

In https://training.galaxyproject.org/training-material/topics/admin/tutorials/connect-to-compute-cluster/tutorial.html: You wrote: Add a post_task to your playbook to install slurm-drmaa1 (Debian/Ubuntu) or slurm-drmaa (RedHat/CentOS), and additionally include the galaxyproject.repos role Then maybe you could use:

  post_tasks:
    - name: Install slurm-drmaa1 if Debian
      package:
        name: slurm-drmaa1
      when: ansible_os_family == "Debian"
    - name: Install slurm-drmaa if RedHat
      package:
        name: slurm-drmaa
      when: ansible_os_family == "RedHat"

(If I undertood well...)

lldelisle commented 4 years ago

To myself: ansible_python.version.major

hexylena commented 4 years ago

combination of statements and opinions from @natefoo @Slugger70 @mvdbeek @nsoranzo @hexylena and @shiltemann, synthesized into one summary/todo list.

Barcelona

This training was fantastic! And incredibly strange, things worked! Like flawlessly nearly. We got through 5 days of content in 3. We had to come up with an extra 2 days.

A notable difference this time was how many students tried to run the playbooks immediately on their own infrastructure, either from the start on their own VMs, or after class on their own infra. Despite asking everyone to run it on the VM, we also had a couple of people brave enough to run from their own laptop, mostly without issue.

All around great set of participants! But it led us to focus on areas we need to improve the materials

Seeing the Effects

From @natefoo:

an idea I had: two column design on the tutorials where one column is the things you do in ansible and the other column is the effects it has on the system

this latest training went well but at times it felt very black-boxish, "just run these things and voila!"

For something like the ansible tutorial we could show a

$ cat /tmp/test.txt
some contents

In something like the galaxy tutorial we'd show all the changes to the system that each step makes. I'd say something like the latest commit on the release_XX.YY branch has been cloned to /srv/galaxy/server

In order to reduce how much it needs to be updated, we will just use this in the first two trainings where we need to show this effect (ansible, ansible-galaxy).

The students can then see the differences the ansible is making and gain the understanding to help enable them to troubleshoot.. As things never always "just work", especially when running on varied or outsourced hardware, with the large viariety of quality of tools etc..

"Real exercises"

We noted that a few students had issues with how ansible really works, variables being set in different places, which changes have which effects. So we're considering adding "real" exercises or hide a bit more the answers for some of the ones we already have.

It's a tough balance to strike. For most of the questions & answers in ansible-galaxy, they're awful, they ask "how does your final config look" and everyone just copies that. Maybe we should rewrite them as "Here is the config." and ask better questions??? "what does this do?" "what effect will that have?"

We should show the students Ansible Best Practices at some point? Before the training? Or after the 1st day? https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html

And we should consider developing "Ansible - advanced" or an ansible "exam" (CTF?) for the students, saying "ok, now that you know ansible, accomplish these tasks"

I also think that sometimes "just re-run the playbook" isn't enough.. Figuring out why something has changed can sometimes be more important for the big picture than how to do it. (If that makes sense.)

Continuum

I think there's a continuum, at one end is "galaxy of a few years ago where people needed to be programmers/tool devs/admins together, and we needed to teach everything in detail so they could debug" and the other end is "galaxy (of the current/ future) where things mostly just work, and they can just deploy it and not care too much since the documentation / tutorials cover all of the main points, and they don't resort to low level debugging"

If we're really moving to the "just works" end, maybe we remove that detail from the curriculum because it doesn't benefit students vs a higher level picture.

I think if they're gonna go back and not use ansible it's good to show "here's what this production deployment looks like" so they can adapt it for their own purposes

We sympathise with "ought to get an in-depth understanding", but:

It's two sides of a coin... people coming to a week long training probably ought to come away with a pretty low level understanding - but we've also found that it's really difficult to teach that low level understanding, especially to folks who mostly aren't sysadmins.

Which leads us to the next question:

What is "A Galaxy Admin"

What should students come away from GAT knowing how to do?

everything else is less important?

Splitting

We should include more on the splitting of roles amongst machines, and write them in a way they can be used as-is. E.g. transitioning from ident auth to network auth is complex (see next aside). A number of participants tried deploying the playbook on their own systems toward the end of the week and some struggled with getting the proper DB configuration.

So db on separate server as an example and how to setup the ansible to do things like that. And talk about production setups for a large user base in detail. The benefits of automation for larger setups and some examples of tool maintenance etc.

There are now I think two different places in the tutorials where we say "if we were really doing best practices we'd create a new group and put vars in a different group vars file," maybe we should just do that,

I'd see the following splitting for the whole week:

  1. db
  2. galaxy (+proxy +slurm submit +tiaas)
  3. compute-central manager
  4. compute exec
  5. pulsar
  6. monitoring (influx/grafana)

In ansible-galaxy, only one split, db + galaxy that sounds manageable. And it is a good place to introduce this concept of "here is where you can divide your infrastructure"

DB Auth

let's bind to 127, and use md5, and make everyone use passwords. I think that would be a positive change over ident magic. (I mean, I love ident, but, it's difficult to switch / not obvious for students)

Conclusion

hexylena commented 3 years ago

WIP implementation of the side-by-side discussed during admin debriefing

image

hexylena commented 3 years ago

@annefou this might be interesting for you, too! Do you have any feedback on this? Authors have the choice of

natefoo commented 1 year ago
  • CVMFS/ref data
    • Make proper tutorial of this

https://github.com/galaxyproject/training-material/pull/3778

hexylena commented 1 year ago

In general I think enough of this is done to finally close it out.