Post-GAT2021 Improvements

hexylena commented 3 years ago

Some issues moved to https://github.com/galaxyproject/training-material/issues/2583

Misc

[ ] check that the file sending we added actually works
[x] update tiaas role if not done already
[x] add note about which ports need to be exposed for people not using our machines (Thanks @dickgroenenberg!)

Validation

[ ] validation for XML in galaxy role xref
[x] validation for nginx config in nginx role. @natefoo

Role Changes

[x] Remove listening on HTTP port 5000 @hexylena
[x] merge systemd role into galaxy, change uwsgi port @hexylena
[x] remove galaxy_zergpool_listen_addr from training? @hexylena
[x] fix influx role
[x] In Running Jobs on Remote Resources with Pulsar the variable galaxy_server_url should be named galaxy_server_hostname or galaxy_server_address or something similar, since it's an FQDN (or IP) rather than a URL. anyone

VM Environment

[x] swap default editor to nano (esp for vault.)
[x] we could add Cockpit to the attendees' VMs, login with user+password, gives access to a terminal session and logs, no need to install Putty for people on Windows. It uses port 8080 which we presently use for Galaxy uwsgi, we should change the Galaxy port as part of merging the systemd role (I've added a note above, it's good to leave that port free anyway) Security-minded attendees may complain of an increased attack surface, add a pointer to cockpit-project.org/blog/is-cockpit-secure.html
[x] Add instructions for helpers and/or coordinators on how to use gat to the Helpers.md page

Training Updates

[x] link to galaxyproject/ansible-nginx#ssl-configuration in a box next to LE bits
[x] More of a ansible-galaxy thing, but datasets should be stored by uuid, not id by default.
[x] We should use vault and set a secret-id for the rest of the training, not just day1
[x] We can link to galaxyproject/ansible-nginx#ssl-configuration in a box, but I don't think it's our responsibility to say much more?
[x] We could add training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html to the program (when running the 5-days course).
[x] "Once you cd into the directory, autofs will automatically mount the repository and files will be listed." that was confusing, it's in the solution box but refers to the step after it, we should reword
[x] training.galaxyproject.org/training-material/topics/admin/tutorials/data-library/tutorial.html#from-history has the wrong screenshot, needs a tip about "if you use a diff email" @hexylena
[x] Running Jobs on Remote Resources with Pulsar - job_metrics_config_file option missing https://github.com/galaxyproject/ansible-pulsar/issues/19 @natefoo
[x] More config options (See Missing Config Options below)
[x] templates/galaxy/config/object_store_conf.xml in the Distributed Object Storage tutorial should be a .xml.j2.
[x] cgroup-tools missing from job metrics.
[x] tiaas We next need to configure this plugin in our job configuration (files/galaxy/config/job_conf.xml). Should be templates/galaxy/config/job_conf.xml.j2 to match rest of training?
[x] reports tip box on how to secure reports anyone
[x] Document sensible value for CVMFS cache (100 GB for simon)
[ ] Singularity and volume binding: When adding more object stores, it seems job_conf:$singularity_defaults is not populated with these new paths? Fix: Add a singularity_volumes parameter to job_conf.xml, to include the new data volume(s): https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/job_conf.xml.sample_advanced#L566 (Or make Galaxy add these automagically?) @mvdbeek

Tutorial Feedback

anyone

From https://github.com/galaxyproject/training-material/issues/989 :

[x] Ansible:
- [ ] ~More information on using git repos with ansible would be helpful~ No.
- [x] specifying that all commands (including ansible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro
- [ ] ~I was confused at first by the "service" service. More real, less abstract examples would be clearer, IMO~ Helena says no.
[x] Galaxy Installation with Ansible:
- [x] templates/nginx/galaxy.j2 -> "uwsgi_pass 127.0.0.1:8080" should not be configured statically and changed to a variable from the groups_vars if the port is changed there in the uwsgi variable settings
- [x] note about using non- let's encrypt certificate
- [x] for me as a noob some diagrams or schemes would often be helpful to see how things relate to each others
[x] Use Singularity containers for running Galaxy jobs
- [x] I'm coming at this as a non-galaxy user so jumping straight into the interface was initially a bit confusing, a quick video tour of the Galaxy interface (~5 minutes) beforehand would have made this easier for me
- [x] "Modify the file" parts (e.g. points 1. and 5. of "Hands-on: Configure Galaxy to use Singularity") are clear and a useful exercise to better understand the ansible and galaxy hierarchy, but if for some reason you made a mistake in a previous step, it could be useful to also have a snippet of the whole modified code to fasten the correction process and avoid backtracking.
- [x] Maybe some notes about what may require proxies
[ ] Galaxy Tool Management with Ephemeris
- [ ] The flow of the tutorial feels awkward in places - you extract the workflow but then install a tool singly before going back to the extracted .yml to do a batch.
- [ ] Not directly related to this tutorial but coming from the previous Galaxy setup tutorials, I'm left thinking - what happened to Ansible and the concept of reinstalling the entire Galaxy in one playbook?
- [ ] It could be explained how to include this tasks in the ansible playbook (if possible) in the case of a full re-installation of Galaxy. Or maybe better separate the two steps...
[x] Reference Data with CVMFS:
- [x] For the most used datasets (for ex. hg38) could we have a local copy, or would that be irrelevant? Could you explain how to calculate a good cache space? If I use a cluster, will I need to configure this FS in each node (given that the folder is at / directly)?
[x] Connecting Galaxy to a compute cluster:
- [x] The references to pulsar in the examples could be confusing, might be worth adding a warning for anyone who is going through this tutorial before the pulsar tutorial
- [ ] ~Would like info on adding to existing clusters (ie., SGE, etc)~ Helena says no
- [x] I have stuck in the part of editing templates/galaxy/config/job_conf.xml.j2 because some lines differ from the resulting file from previous session (namely singularity was set as default) and I had to compare the file showed in the video with the file I had. I took some time, but it worked at the end. It seems not so complicated now, but it will be when connecting to a living cluster. What happens when I have SLURM already configured at the server? And MUNGE (this guy made some nodes crash here because of very large log files), do I need to configure it in the cluster? It was not clear.
[x] Mapping Jobs to Destinations:
- [x] The Python code and some of the xml seems to paste into the cli with loads of new tab characters, in vim I used ':set paste' to switch off auto indent. Doesn't happen with the yml though.
- [x] This task includes many layers of complexity. It would be nice if, at the beginning or ending of each subtopic the needed changes were pointed in the file tree. For example, using the 'tree' command and then highlight all the files that have to be created / edited for this feature to work. It is just for better visualization of the modifications. I get something useful when calling git status.
[x] Recording Job Metrics:
- [x] Regarding 'expose_potentially_sensitive_job_metrics' , is there an option to expose different metrics for admins and general users? For example, creating different profiles in templates/galaxy/config/job_metrics_conf.xml.j2
[x] Running Jobs on Remote Resources with Pulsar:
- [x] The tutorial assumes a bit more knowledge than a lot of the others so it won't be as useful for someone who comes to it stand-alone as a pulsar via ansible setup guide.
[x] Distributed Object Storage:
- [x] "Warning: switching object store types will cause issues" - suggest putting that at the top and emphasise that this is a tutorial that shouldn't be blindly followed on a proper install. The S3 section assumed quite a lot of knowledge - I didn't understand, but expect someone who manages data in an S3 bucket will!
[x] Galaxy Monitoring with Telegraf and Grafana:
- [x] I found the content on Grafana and monitoring/alerts really confusing, it felt almost like it is for an older version of Grafana.
[x] Galaxy Monitoring with Reports:
- [x] Add a short section on using nginx basic authentication to secure it from public eyes.
[x] Training Infrastructure as a Service (TIaaS):
- [x] "We next need to configure this plugin in our job configuration (files/galaxy/config/job_conf.xml)": Should be templates/galaxy/config/job_conf.xml.j2 to match rest of training?

Missing Config Options

Details

``` # Perf database_engine_option_server_side_cursors: true slow_query_log_threshold: 5 enable_per_request_sql_debugging: true nginx_x_accel_redirect_base: /_x_accel_redirect # watchdog library too? watch_tools: 'auto' # That's only needed if you dump tools into a directory and expect new tools to show up. I wouldn't enable this on a production instance. watch_job_rules: 'auto' # Admin convenience allow_path_paste: true library_import_dir: /data/library enable_quotas: true cleanup_job: onerror allow_user_deletion: true allow_user_impersonation: true # user convenience show_welcome_with_login: true expose_user_name: true expose_dataset_path: true expose_potentially_sensitive_job_metrics: true # Other outputs_to_working_directory: true ```

Diffs

indentation is not wrong, but the context is not what it should be
the diffs kind of bake a order of tutorial into them :confused:

they do
it's not great.

For me there was nothing ambigous, but you may look for an anchor
if you don’t understand how yaml / xml works that might be confusing

Testing

I will not organise this again without a testing strategy. We had many times were updates required changes to earlier tutorials, which then you do not know if things will work from scratch, or from your already modified machine.

We need molecule tests end to end.

mvdbeek commented 3 years ago

More of a ansible-galaxy thing, but datasets should be stored by uuid, not id by default.

hexylena commented 3 years ago

We should use vault and set a secret-id for the rest of the training, not just day1

nsoranzo commented 3 years ago

From https://github.com/galaxyproject/training-material/issues/989 :

[ ] Ansible:
- [ ] More information on using git repos with ansible would be helpful
- [ ] specifying that all commands (including ansible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro
- [ ] I was confused at first by the "service" service. More real, less abstract examples would be clearer, IMO
[ ] Galaxy Installation with Ansible:
- [ ] templates/nginx/galaxy.j2 -> "uwsgi_pass 127.0.0.1:8080" should not be configured statically and changed to a variable from the groups_vars if the port is changed there in the uwsgi variable settings
- [ ] note about using non- let's encrypt certificate
- [ ] for me as a noob some diagrams or schemes would often be helpful to see how things relate to each others -[ ] Use Singularity containers for running Galaxy jobs
- [ ] I'm coming at this as a non-galaxy user so jumping straight into the interface was initially a bit confusing, a quick video tour of the Galaxy interface (~5 minutes) beforehand would have made this easier for me
- [ ] "Modify the file" parts (e.g. points 1. and 5. of "Hands-on: Configure Galaxy to use Singularity") are clear and a useful exercise to better understand the ansible and galaxy hierarchy, but if for some reason you made a mistake in a previous step, it could be useful to also have a snippet of the whole modified code to fasten the correction process and avoid backtracking.
- [ ] Maybe some notes about what may require proxies
[ ] Galaxy Tool Management with Ephemeris
- [ ] The flow of the tutorial feels awkward in places - you extract the workflow but then install a tool singly before going back to the extracted .yml to do a batch.
- [ ] Not directly related to this tutorial but coming from the previous Galaxy setup tutorials, I'm left thinking - what happened to Ansible and the concept of reinstalling the entire Galaxy in one playbook?
- [ ] It could be explained how to include this tasks in the ansible playbook (if possible) in the case of a full re-installation of Galaxy. Or maybe better separate the two steps...
[ ] Reference Data with CVMFS:
- [ ] For the most used datasets (for ex. hg38) could we have a local copy, or would that be irrelevant? Could you explain how to calculate a good cache space? If I use a cluster, will I need to configure this FS in each node (given that the folder is at / directly)?
[ ] Connecting Galaxy to a compute cluster:
- [ ] The references to pulsar in the examples could be confusing, might be worth adding a warning for anyone who is going through this tutorial before the pulsar tutorial
- [ ] Would like info on adding to existing clusters (ie., SGE, etc)
- [ ] I have stuck in the part of editing templates/galaxy/config/job_conf.xml.j2 because some lines differ from the resulting file from previous session (namely singularity was set as default) and I had to compare the file showed in the video with the file I had. I took some time, but it worked at the end. It seems not so complicated now, but it will be when connecting to a living cluster. What happens when I have SLURM already configured at the server? And MUNGE (this guy made some nodes crash here because of very large log files), do I need to configure it in the cluster? It was not clear.
[ ] Mapping Jobs to Destinations:
- [ ] The Python code and some of the xml seems to paste into the cli with loads of new tab characters, in vim I used ':set paste' to switch off auto indent. Doesn't happen with the yml though.
- [ ] This task includes many layers of complexity. It would be nice if, at the beginning or ending of each subtopic the needed changes were pointed in the file tree. For example, using the 'tree' command and then highlight all the files that have to be created / edited for this feature to work. It is just for better visualization of the modifications. I get something useful when calling git status.
[ ] Recording Job Metrics:
- [ ] Regarding 'expose_potentially_sensitive_job_metrics' , is there an option to expose different metrics for admins and general users? For example, creating different profiles in templates/galaxy/config/job_metrics_conf.xml.j2
[ ] Running Jobs on Remote Resources with Pulsar:
- [ ] The tutorial assumes a bit more knowledge than a lot of the others so it won't be as useful for someone who comes to it stand-alone as a pulsar via ansible setup guide.
[ ] Distributed Object Storage:
- [ ] "Warning: switching object store types will cause issues" - suggest putting that at the top and emphasise that this is a tutorial that shouldn't be blindly followed on a proper install. The S3 section assumed quite a lot of knowledge - I didn't understand, but expect someone who manages data in an S3 bucket will!
[ ] Galaxy Monitoring with Telegraf and Grafana:
- [ ] I found the content on Grafana and monitoring/alerts really confusing, it felt almost like it is for an older version of Grafana.
[ ] Galaxy Monitoring with Reports:
- [ ] Add a short section on using nginx basic authentication to secure it from public eyes.
[ ] Training Infrastructure as a Service (TIaaS):
- [ ] "We next need to configure this plugin in our job configuration (files/galaxy/config/job_conf.xml)": Should be templates/galaxy/config/job_conf.xml.j2 to match rest of training?

nsoranzo commented 3 years ago

gantsign.golang role uses deprecated sha256sum parameter, which will be removed in ansible.base 2.14 (use checksum instead).

https://github.com/gantsign/ansible-role-golang/issues/183

hexylena commented 3 years ago

I thought the spaces before + are indentations..

yeah we def need a copy+paste view of the diff. I've got some JS that does an OK job, just need to make the presentation better. Then we'll havea a button to switch between "the real diff" vs "here are the lines you need to add."

hexylena commented 3 years ago

add validation for xml files pre-restart.

https://docs.ansible.com/ansible/latest/collections/ansible/builtin/copy_module.html#parameter-validate

hexylena commented 3 years ago

specifiying that all commands (including andible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro

fixed by using diffs everywhere I guess.

I was confused at first by the "service" service. More real, less abstract examples would be clearer, IMO

fair, but, it's also just to learn about ansible. not sure.

templates/nginx/galaxy.j2 -> "uwsgi_pass 127.0.0.1:8080" should not be configured statically and changed to a variable from the groups_vars if the port is changed there in the uwsgi variable settings

I think we want to integrate the systemd role into the galaxy role (which we should've done a while ago.) then this step will be skipped completely and simply not possible to do, which will resolve the many, many issues people have running uwsgi by hand (ports, permissions, etc)

note about using non- let's encrypt certificate

We can link to https://github.com/galaxyproject/ansible-nginx#ssl-configuration in a box, but I don't think it's our responsibility to say much more?

nsoranzo commented 3 years ago

specifying that all commands (including ansible-galaxy) should be run in the intro directory, I had to rsync my new ~/roles folder to intro

fixed by using diffs everywhere I guess.

I think this is more a matter of adding some cd ~/intro/ or something like that?

nsoranzo commented 3 years ago

Suggestion from @hexylena: we could add Cockpit to the attendees' VMs, login with user+password, gives access to a terminal session and logs, no need to install Putty for people on Windows.
- It uses port 8080 which we presently use for Galaxy uwsgi, we should change the Galaxy port as part of merging the systemd role (I've added a note above, it's good to leave that port free anyway)
- Security-minded attendees may complain of an increased attack surface, add a pointer to https://cockpit-project.org/blog/is-cockpit-secure.html

nsoranzo commented 3 years ago

I'm coming at this as a non-galaxy user so jumping straight into the interface was initially a bit confusing, a quick video tour of the Galaxy interface (~5 minutes) beforehand would have made this easier for me

We could add https://training.galaxyproject.org/training-material/topics/introduction/tutorials/galaxy-intro-short/tutorial.html to the program (when running the 5-days course).

hexylena commented 3 years ago

:+1:, they should learn to be users themselves. I think for physical courses we've mostly filtered for people who are already running a small galaxy, but, online we have many more new people.

hexylena commented 3 years ago

Once you cd into the directory, autofs will automatically mount the repository and files will be listed.

that was confusing, it's in the solution box but refers to the step after it, we should reword

nsoranzo commented 3 years ago

https://github.com/galaxyproject/training-material/pull/2241 changed the ephemeris tutorial to install pilon instead of bwa, but bwa is used in the following CVMFS tutorial. Revert? In the mean time, I'm going to add instructions to install bwa to the CVMFS tutorial.

cat-bro commented 3 years ago

@nsoranzo it looks like the pulsar tutorial also assumes bwa is installed [Updated] The pulsar tutorial already has instructions for installing bwa

nsoranzo commented 3 years ago

@cat-bro Is it fine to get back to bwa in the ephemeris tuto?

hexylena commented 3 years ago

https://training.galaxyproject.org/training-material/topics/admin/tutorials/data-library/tutorial.html#from-history has the wrong screenshot, needs a tip about "if you use a diff email"

cat-bro commented 3 years ago

@nsoranzo I don't know. Given that half of the students will have already done this tutorial, it might be more confusing to revert it at this point? The half that haven't would be left with a video tut that is different from the document.

cat-bro commented 3 years ago

Either way, a tip for installing BWA is probably needed in both cvmfs and pulsar tutorials

hexylena commented 3 years ago

We can do a tip now (maybe make it a snippet that we can generically include in both) and then decide later on one or the other maybe?

hexylena commented 3 years ago

remove galaxy_zergpool_listen_addr from raining

cat-bro commented 3 years ago

@hexylena @nsoranzo: re bwa (1) it seems to be ok following the cvmfs tutorial text, since the direction is to look at BWA or bowtie2, whichever is installed, and they do have bowtie2 (installed from the workflow tool list in the ephemeris tutorial). (2) the pulsar tutorial already has instructions for installing bwa.

nsoranzo commented 3 years ago

@hexylena @nsoranzo: re bwa (1) it seems to be ok following the cvmfs tutorial text, since the direction is to look at BWA or bowtie2, whichever is installed, and they do have bowtie2 (installed from the workflow tool list in the ephemeris tutorial).

True, but the CVMFS hands-on at point 4. says "Login to Galaxy as the admin user, and go to Admin → Data Tables → bwa_mem indexes" which doesn't make sense if you run bowtie2.

(2) the pulsar tutorial already has instructions for installing bwa.

Yes, I'm making a snippet out of that.

cat-bro commented 3 years ago

True, but the CVMFS hands-on at point 4. says "Login to Galaxy as the admin user, and go to Admin → Data Tables → bwa_mem indexes" which doesn't make sense if you run bowtie2.

This step does work without bwa being installed.

nsoranzo commented 3 years ago

From an attendee:

I have found something confusing at the tutorial https://training.galaxyproject.org/training-material/topics/admin/tutorials/cvmfs/tutorial.html At Hands-on: Installing CVMFS with Ansible 3 - Edit the group variables file, group_vars/galaxyservers.yml: Here it says that this variables can be included at group_vars/all.yml So, I am not sure if I need to edit anything at group_vars/galaxyservers.yml

The complete sentence is "Add the following lines to your group_vars/all.yml file, creating it if it doesn’t exist" but above it says "Edit the group variables file, group_vars/galaxyservers.yml" (the latter seems wrong to me).

gmauro commented 3 years ago

Running Jobs on Remote Resources with Pulsar - job_metrics_config_file option missing #2302

hexylena commented 3 years ago

fix influx role
add cleanup cron job role

hexylena commented 3 years ago

Add all this crap. I setup galaxy for work and used it in anger..... I was shocked at how much was missing that I expect to be there. We should be setting many of these at some point during the week.

# Perf
database_engine_option_server_side_cursors: true
slow_query_log_threshold: 5
enable_per_request_sql_debugging: true
nginx_x_accel_redirect_base: /_x_accel_redirect

# watchdog library too?
watch_tools: 'auto'
watch_job_rules: 'auto'

# Admin convenience
allow_path_paste: true
library_import_dir: /data/library
enable_quotas: true
cleanup_job: onerror
allow_user_deletion: true
allow_user_impersonation: true

# user convenience
show_welcome_with_login: true
expose_user_name: true
expose_dataset_path: true
expose_potentially_sensitive_job_metrics: true

# Other
outputs_to_working_directory: true

nsoranzo commented 3 years ago

True, I was also comparing the outcome of the ansible-galaxy tutorial with what I have in production and noticed some of these major missing bits.

natefoo commented 3 years ago

templates/galaxy/config/object_store_conf.xml in the Distributed Object Storage tutorial should be a .xml.j2.

natefoo commented 3 years ago

In Running Jobs on Remote Resources with Pulsar the variable galaxy_server_url should be named galaxy_server_hostname or galaxy_server_address or something similar, since it's an FQDN (or IP) rather than a URL.

Slugger70 commented 3 years ago

From the Slack:

Note for "Galaxy Monitoring with Reports". Step 4 should put the location of the reports app in the  templates/nginx/galaxy.j2 file, not in group_vars/galaxyservers.yml

hexylena commented 3 years ago

cgroup-tools missing from job metrics.

hexylena commented 3 years ago

tiaas

We next need to configure this plugin in our job configuration (files/galaxy/config/job_conf.xml): Should be templates/galaxy/config/job_conf.xml.j2 to match rest of training?

reports

tip box on how to secure reports

natefoo commented 3 years ago

GxIT training breaks Singularity training (the container_resolvers_conf.xml is invalid for Docker).

hexylena commented 3 years ago

So what exactly is a “sensible” value for this? Currently I am using data mangers for a select number of references. My biggest item as of now is a 32GB RNAStar index of HG38.

on my producion machine i have a 100gb cache (todo: xref playbooks)

mvdbeek commented 3 years ago

watch_tools: 'auto'

That's only needed if you dump tools into a directory and expect new tools to show up. I wouldn't enable this on a production instance.

torfinnnome commented 3 years ago

Singularity and volume binding: When adding more object stores, it seems job_conf:$singularity_defaults is not populated with these new paths?

Fix: Add a singularity_volumes parameter to job_conf.xml, to include the new data volume(s): https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/job_conf.xml.sample_advanced#L566

(Or make Galaxy add these automagically?)

hexylena commented 3 years ago

from @Slugger70 on email

Erlang and Rabbit have a very “interesting” relationship. Different versions of RabbitMQ are very dependent on a particular version of Erlang. I have to pin the versions to be installed in my playbooks as the defaults don’t always work.

can you share those pins back and let's get them in the playbooks so users don't have these issues? (if that's possible)

cat-bro commented 2 years ago

I'm happy to take on making some small changes to the ephemeris tutorial. (1) Switch the installed tool back to bwa and choose the tool for the testing step from any of the installed tools (bwa, bowtie2, bam_filter etc). (2) feedback comment: The flow of the tutorial feels awkward in places - you extract the workflow but then install a tool singly before going back to the extracted .yml to do a batch. I'm wondering whether switching order of steps from what it currently is: "workflow-to-tools", "install one tool", "install workflow tools" to "install one tool", "workflow-to-tools", "install workflow tools" would help with this. (3) feedback comments Not directly related to this tutorial but coming from the previous Galaxy setup tutorials, I'm left thinking - what happened to Ansible and the concept of reinstalling the entire Galaxy in one playbook? and It could be explained how to include this tasks in the ansible playbook (if possible) in the case of a full re-installation of Galaxy. Or maybe better separate the two steps... I remember feeling the same way when I first took galaxy admin courses. Now, having spent more time with galaxy I see tool management as something separate that I would not include in a set of infrastructure playbooks. Maybe to address this there could be a slide talking about the Galaxy API and how there are lots of things we want to be able to do outside of the main infrastructure setup.

hexylena commented 2 years ago

Fantastic! That sounds great! I like the new ordering that's proposed

natefoo commented 2 years ago

maybe add https://docs.galaxyproject.org/en/latest/admin/nginx.html#receiving-files-with-nginx too? it's better for performance

Chunked uploading fixes this for the UI. The cases where this would still be useful are scripted uploads and, if you set it up for the job files API (not sure if we documented this anywhere but the .org configs have it), Pulsar transfers. I think we would want a means for dynamically compiling the upload module before adding this as well since our nginx packages with the static module are not well maintained.

natefoo commented 2 years ago

check that the file sending we added actually works

I used to have manual verification with this using wget/curl in the salt lake version of the tutorial, I'll try to dig it up and see if we can automate it.

mvdbeek commented 2 years ago

Chunked uploading fixes this for the UI.

It's still not great for performance since the individual chunks still need to pass through the web handlers, the old upload module or https://github.com/pgaertig/nginx-big-upload are better for overall performance.

hexylena commented 2 years ago

Chunked uploading fixes this for the UI

the performance was complete garbage at EU. Nginx would buffer it once, uwsgi would re-buffer it again to a different location, then the chunked module would reassemble. Swapping to nginx made a massive difference in web handler responsiveness during big uploads (since we were trashing our disk less too)

cat-bro commented 2 years ago

I think we would want a means for dynamically compiling the upload module before adding this as well since our nginx packages with the static module are not well maintained.

Oz has a local role (ubuntu-oriented) for this that we use alongside the galaxyproject.nginx role: https://github.com/usegalaxy-au/infrastructure/tree/master/roles/nginx-upload-module

natefoo commented 2 years ago

@cat-bro awesome, thank you!

natefoo commented 2 years ago

validation for nginx config in nginx role. @natefoo

galaxyproject/ansible-nginx#11

Slugger70 commented 2 years ago

The version of rabbit that we use needs to be set to 3.8.16

from @Slugger70 on email

Erlang and Rabbit have a very “interesting” relationship. Different versions of RabbitMQ are very dependent on a particular version of Erlang. I have to pin the versions to be installed in my playbooks as the defaults don’t always work.

can you share those pins back and let's get them in the playbooks so users don't have these issues? (if that's possible)

We need to overide the version of rabbitmq that we install to the latest one (3.8.16) that matches the new default erlang (24.x) install in Ubuntu 20.04. Otherwise any apt update && apt upgrade will install the latest erlang no matter which version we have pinned in the playbook.

I will add the version to the Pulsar tutorial.

See: https://www.rabbitmq.com/which-erlang.html for details.

natefoo commented 2 years ago

@Slugger70 I wonder if we should use one of the repos in the RabbitMQ install docs for Debian/Ubuntu to install an updated Erlang? They make it sound like it should install the correct Erlang for the selected RabbitMQ if you're using Cloudsmith (or maybe Packagecloud?), but maybe it doesn't work so magically as they imply.

natefoo commented 2 years ago

Running Jobs on Remote Resources with Pulsar - job_metrics_config_file option missing #2302

WIP, please comment: galaxyproject/ansible-galaxy#133

If this looks like the way forward we can just default pulsar_job_metrics_plugins to galaxy_job_metrics_plugins after updating the Pulsar role to use the new syntax and write YAML configs instead of XML.

galaxyproject / training-material