galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
294 stars 846 forks source link

Post-GAT2021 Improvements #2274

Closed hexylena closed 2 years ago

hexylena commented 3 years ago

Some issues moved to https://github.com/galaxyproject/training-material/issues/2583

Misc

Validation

Role Changes

VM Environment

Training Updates

Tutorial Feedback

anyone

From https://github.com/galaxyproject/training-material/issues/989 :

Missing Config Options

Details ``` # Perf database_engine_option_server_side_cursors: true slow_query_log_threshold: 5 enable_per_request_sql_debugging: true nginx_x_accel_redirect_base: /_x_accel_redirect # watchdog library too? watch_tools: 'auto' # That's only needed if you dump tools into a directory and expect new tools to show up. I wouldn't enable this on a production instance. watch_job_rules: 'auto' # Admin convenience allow_path_paste: true library_import_dir: /data/library enable_quotas: true cleanup_job: onerror allow_user_deletion: true allow_user_impersonation: true # user convenience show_welcome_with_login: true expose_user_name: true expose_dataset_path: true expose_potentially_sensitive_job_metrics: true # Other outputs_to_working_directory: true ```

Diffs

indentation is not wrong, but the context is not what it should be
the diffs kind of bake a order of tutorial into them :confused:

they do
it's not great.

For me there was nothing ambigous, but you may look for an anchor
if you don’t understand how yaml / xml works that might be confusing

Testing

I will not organise this again without a testing strategy. We had many times were updates required changes to earlier tutorials, which then you do not know if things will work from scratch, or from your already modified machine.

We need molecule tests end to end.

mvdbeek commented 3 years ago

More of a ansible-galaxy thing, but datasets should be stored by uuid, not id by default.

hexylena commented 3 years ago

We should use vault and set a secret-id for the rest of the training, not just day1

nsoranzo commented 3 years ago

From https://github.com/galaxyproject/training-material/issues/989 :

nsoranzo commented 3 years ago

gantsign.golang role uses deprecated sha256sum parameter, which will be removed in ansible.base 2.14 (use checksum instead).

https://github.com/gantsign/ansible-role-golang/issues/183

hexylena commented 3 years ago

I thought the spaces before + are indentations..

yeah we def need a copy+paste view of the diff. I've got some JS that does an OK job, just need to make the presentation better. Then we'll havea a button to switch between "the real diff" vs "here are the lines you need to add."

hexylena commented 3 years ago

add validation for xml files pre-restart.

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_module.html#parameter-validate

hexylena commented 3 years ago

specifiying that all commands (including andible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro

fixed by using diffs everywhere I guess.

I was confused at first by the "service" service. More real, less abstract examples would be clearer, IMO

fair, but, it's also just to learn about ansible. not sure.

templates/nginx/galaxy.j2 -> "uwsgi_pass 127.0.0.1:8080" should not be configured statically and changed to a variable from the groups_vars if the port is changed there in the uwsgi variable settings

I think we want to integrate the systemd role into the galaxy role (which we should've done a while ago.) then this step will be skipped completely and simply not possible to do, which will resolve the many, many issues people have running uwsgi by hand (ports, permissions, etc)

note about using non- let's encrypt certificate

We can link to https://github.com/galaxyproject/ansible-nginx#ssl-configuration in a box, but I don't think it's our responsibility to say much more?

nsoranzo commented 3 years ago

specifying that all commands (including ansible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro

fixed by using diffs everywhere I guess.

I think this is more a matter of adding some cd ~/intro/ or something like that?

nsoranzo commented 3 years ago
nsoranzo commented 3 years ago

I'm coming at this as a non-galaxy user so jumping straight into the interface was initially a bit confusing, a quick video tour of the Galaxy interface (~5 minutes) beforehand would have made this easier for me

We could add https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html to the program (when running the 5-days course).

hexylena commented 3 years ago

:+1:, they should learn to be users themselves. I think for physical courses we've mostly filtered for people who are already running a small galaxy, but, online we have many more new people.

hexylena commented 3 years ago

Once you cd into the directory, autofs will automatically mount the repository and files will be listed.

that was confusing, it's in the solution box but refers to the step after it, we should reword

nsoranzo commented 3 years ago

https://github.com/galaxyproject/training-material/pull/2241 changed the ephemeris tutorial to install pilon instead of bwa, but bwa is used in the following CVMFS tutorial. Revert? In the mean time, I'm going to add instructions to install bwa to the CVMFS tutorial.

cat-bro commented 3 years ago

@nsoranzo it looks like the pulsar tutorial also assumes bwa is installed [Updated] The pulsar tutorial already has instructions for installing bwa

nsoranzo commented 3 years ago

@cat-bro Is it fine to get back to bwa in the ephemeris tuto?

hexylena commented 3 years ago

https://training.galaxyproject.org/training-material/topics/admin/tutorials/data-library/tutorial.html#from-history has the wrong screenshot, needs a tip about "if you use a diff email"

cat-bro commented 3 years ago

@nsoranzo I don't know. Given that half of the students will have already done this tutorial, it might be more confusing to revert it at this point? The half that haven't would be left with a video tut that is different from the document.

cat-bro commented 3 years ago

Either way, a tip for installing BWA is probably needed in both cvmfs and pulsar tutorials

hexylena commented 3 years ago

We can do a tip now (maybe make it a snippet that we can generically include in both) and then decide later on one or the other maybe?

hexylena commented 3 years ago

remove galaxy_zergpool_listen_addr from raining

cat-bro commented 3 years ago

@hexylena @nsoranzo: re bwa (1) it seems to be ok following the cvmfs tutorial text, since the direction is to look at BWA or bowtie2, whichever is installed, and they do have bowtie2 (installed from the workflow tool list in the ephemeris tutorial). (2) the pulsar tutorial already has instructions for installing bwa.

nsoranzo commented 3 years ago

@hexylena @nsoranzo: re bwa (1) it seems to be ok following the cvmfs tutorial text, since the direction is to look at BWA or bowtie2, whichever is installed, and they do have bowtie2 (installed from the workflow tool list in the ephemeris tutorial).

True, but the CVMFS hands-on at point 4. says "Login to Galaxy as the admin user, and go to Admin → Data Tables → bwa_mem indexes" which doesn't make sense if you run bowtie2.

(2) the pulsar tutorial already has instructions for installing bwa.

Yes, I'm making a snippet out of that.

cat-bro commented 3 years ago

True, but the CVMFS hands-on at point 4. says "Login to Galaxy as the admin user, and go to Admin → Data Tables → bwa_mem indexes" which doesn't make sense if you run bowtie2.

This step does work without bwa being installed.

nsoranzo commented 3 years ago

From an attendee:

I have found something confusing at the tutorial https://training.galaxyproject.org/training-material/topics/admin/tutorials/cvmfs/tutorial.html At Hands-on: Installing CVMFS with Ansible 3 - Edit the group variables file, group_vars/galaxyservers.yml: Here it says that this variables can be included at group_vars/all.yml So, I am not sure if I need to edit anything at group_vars/galaxyservers.yml

The complete sentence is "Add the following lines to your group_vars/all.yml file, creating it if it doesn’t exist" but above it says "Edit the group variables file, group_vars/galaxyservers.yml" (the latter seems wrong to me).

gmauro commented 3 years ago

Running Jobs on Remote Resources with Pulsar - job_metrics_config_file option missing #2302

hexylena commented 3 years ago
hexylena commented 3 years ago

Add all this crap. I setup galaxy for work and used it in anger..... I was shocked at how much was missing that I expect to be there. We should be setting many of these at some point during the week.

# Perf
database_engine_option_server_side_cursors: true
slow_query_log_threshold: 5
enable_per_request_sql_debugging: true
nginx_x_accel_redirect_base: /_x_accel_redirect

# watchdog library too?
watch_tools: 'auto'
watch_job_rules: 'auto'

# Admin convenience
allow_path_paste: true
library_import_dir: /data/library
enable_quotas: true
cleanup_job: onerror
allow_user_deletion: true
allow_user_impersonation: true

# user convenience
show_welcome_with_login: true
expose_user_name: true
expose_dataset_path: true
expose_potentially_sensitive_job_metrics: true

# Other
outputs_to_working_directory: true
nsoranzo commented 3 years ago

True, I was also comparing the outcome of the ansible-galaxy tutorial with what I have in production and noticed some of these major missing bits.

natefoo commented 3 years ago

templates/galaxy/config/object_store_conf.xml in the Distributed Object Storage tutorial should be a .xml.j2.

natefoo commented 3 years ago

In Running Jobs on Remote Resources with Pulsar the variable galaxy_server_url should be named galaxy_server_hostname or galaxy_server_address or something similar, since it's an FQDN (or IP) rather than a URL.

Slugger70 commented 3 years ago

From the Slack:

Note for "Galaxy Monitoring with Reports". Step 4 should put the location of the reports app in the  templates/nginx/galaxy.j2 file, not in group_vars/galaxyservers.yml
hexylena commented 3 years ago

cgroup-tools missing from job metrics.

hexylena commented 3 years ago

tiaas

We next need to configure this plugin in our job configuration (files/galaxy/config/job_conf.xml): Should be templates/galaxy/config/job_conf.xml.j2 to match rest of training?

reports

tip box on how to secure reports

natefoo commented 3 years ago

GxIT training breaks Singularity training (the container_resolvers_conf.xml is invalid for Docker).

hexylena commented 3 years ago

So what exactly is a “sensible” value for this? Currently I am using data mangers for a select number of references. My biggest item as of now is a 32GB RNAStar index of HG38.

on my producion machine i have a 100gb cache (todo: xref playbooks)

mvdbeek commented 3 years ago

watch_tools: 'auto'

That's only needed if you dump tools into a directory and expect new tools to show up. I wouldn't enable this on a production instance.

torfinnnome commented 3 years ago

Singularity and volume binding: When adding more object stores, it seems job_conf:$singularity_defaults is not populated with these new paths?

Fix: Add a singularity_volumes parameter to job_conf.xml, to include the new data volume(s): https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/job_conf.xml.sample_advanced#L566

(Or make Galaxy add these automagically?)

hexylena commented 3 years ago

from @Slugger70 on email

Erlang and Rabbit have a very “interesting” relationship. Different versions of RabbitMQ are very dependent on a particular version of Erlang. I have to pin the versions to be installed in my playbooks as the defaults don’t always work.

can you share those pins back and let's get them in the playbooks so users don't have these issues? (if that's possible)

cat-bro commented 2 years ago

I'm happy to take on making some small changes to the ephemeris tutorial. (1) Switch the installed tool back to bwa and choose the tool for the testing step from any of the installed tools (bwa, bowtie2, bam_filter etc). (2) feedback comment: The flow of the tutorial feels awkward in places - you extract the workflow but then install a tool singly before going back to the extracted .yml to do a batch. I'm wondering whether switching order of steps from what it currently is: "workflow-to-tools", "install one tool", "install workflow tools" to "install one tool", "workflow-to-tools", "install workflow tools" would help with this. (3) feedback comments Not directly related to this tutorial but coming from the previous Galaxy setup tutorials, I'm left thinking - what happened to Ansible and the concept of reinstalling the entire Galaxy in one playbook? and It could be explained how to include this tasks in the ansible playbook (if possible) in the case of a full re-installation of Galaxy. Or maybe better separate the two steps... I remember feeling the same way when I first took galaxy admin courses. Now, having spent more time with galaxy I see tool management as something separate that I would not include in a set of infrastructure playbooks. Maybe to address this there could be a slide talking about the Galaxy API and how there are lots of things we want to be able to do outside of the main infrastructure setup.

hexylena commented 2 years ago

Fantastic! That sounds great! I like the new ordering that's proposed

natefoo commented 2 years ago

maybe add https://docs.galaxyproject.org/en/latest/admin/nginx.html#receiving-files-with-nginx too? it's better for performance

Chunked uploading fixes this for the UI. The cases where this would still be useful are scripted uploads and, if you set it up for the job files API (not sure if we documented this anywhere but the .org configs have it), Pulsar transfers. I think we would want a means for dynamically compiling the upload module before adding this as well since our nginx packages with the static module are not well maintained.

natefoo commented 2 years ago

check that the file sending we added actually works

I used to have manual verification with this using wget/curl in the salt lake version of the tutorial, I'll try to dig it up and see if we can automate it.

mvdbeek commented 2 years ago

Chunked uploading fixes this for the UI.

It's still not great for performance since the individual chunks still need to pass through the web handlers, the old upload module or https://github.com/pgaertig/nginx-big-upload are better for overall performance.

hexylena commented 2 years ago

Chunked uploading fixes this for the UI

the performance was complete garbage at EU. Nginx would buffer it once, uwsgi would re-buffer it again to a different location, then the chunked module would reassemble. Swapping to nginx made a massive difference in web handler responsiveness during big uploads (since we were trashing our disk less too)

cat-bro commented 2 years ago

I think we would want a means for dynamically compiling the upload module before adding this as well since our nginx packages with the static module are not well maintained.

Oz has a local role (ubuntu-oriented) for this that we use alongside the galaxyproject.nginx role: https://github.com/usegalaxy-au/infrastructure/tree/master/roles/nginx-upload-module

natefoo commented 2 years ago

@cat-bro awesome, thank you!

natefoo commented 2 years ago

validation for nginx config in nginx role. @natefoo

galaxyproject/ansible-nginx#11

Slugger70 commented 2 years ago

The version of rabbit that we use needs to be set to 3.8.16

from @Slugger70 on email

Erlang and Rabbit have a very “interesting” relationship. Different versions of RabbitMQ are very dependent on a particular version of Erlang. I have to pin the versions to be installed in my playbooks as the defaults don’t always work.

can you share those pins back and let's get them in the playbooks so users don't have these issues? (if that's possible)

We need to overide the version of rabbitmq that we install to the latest one (3.8.16) that matches the new default erlang (24.x) install in Ubuntu 20.04. Otherwise any apt update && apt upgrade will install the latest erlang no matter which version we have pinned in the playbook.

I will add the version to the Pulsar tutorial.

See: https://www.rabbitmq.com/which-erlang.html for details.

natefoo commented 2 years ago

@Slugger70 I wonder if we should use one of the repos in the RabbitMQ install docs for Debian/Ubuntu to install an updated Erlang? They make it sound like it should install the correct Erlang for the selected RabbitMQ if you're using Cloudsmith (or maybe Packagecloud?), but maybe it doesn't work so magically as they imply.

natefoo commented 2 years ago

Running Jobs on Remote Resources with Pulsar - job_metrics_config_file option missing #2302

WIP, please comment: galaxyproject/ansible-galaxy#133

If this looks like the way forward we can just default pulsar_job_metrics_plugins to galaxy_job_metrics_plugins after updating the Pulsar role to use the new syntax and write YAML configs instead of XML.