galaxycomputationalchemistry / galaxy-tools-compchem

:mega: Galaxy Tools for Computational Chemistry
Apache License 2.0
14 stars 16 forks source link

Add GPU option to gmx_sim tool #85

Closed simonbray closed 3 years ago

simonbray commented 3 years ago

@gmauro @bgruening it would be good to have some feedback from you on this (only when you have time, next week is fine). There are a few different areas to think about:

  1. Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with _gpu. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env with conda install gmx_gpu -c simonbray.

  2. Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.

  3. Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.

bgruening commented 3 years ago
Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with _gpu. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env with conda install gmx_gpu -c simonbray.

Ok, that is cool. Against which CUDA version have you compiled it?

Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.

Please use GPU_ENABLED for the moment. We will set this env var together with the other env here: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/files/galaxy/dynamic_rules/usegalaxy/tool_destinations.yaml#L22

Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.

Puh, this is complicated. It might be possible with a pure condor cluster, but with the Pulsar endpoints this is currently not possible I think. However, we could implement to poor-humand-scheduling and implement (on the admin side) and extended tool form, that let's you choose the GPU or CPU setting.

simonbray commented 3 years ago
Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with _gpu. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env with conda install gmx_gpu -c simonbray.

Ok, that is cool. Against which CUDA version have you compiled it?

10.1.

Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.

Please use GPU_ENABLED for the moment. We will set this env var together with the other env here: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/files/galaxy/dynamic_rules/usegalaxy/tool_destinations.yaml#L22

So we can create two possible envs here, one set to true and one to false?

Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.

Puh, this is complicated. It might be possible with a pure condor cluster, but with the Pulsar endpoints this is currently not possible I think. However, we could implement to poor-humand-scheduling and implement (on the admin side) and extended tool form, that let's you choose the GPU or CPU setting.

Human control is fine, at least for now. :+1:

simonbray commented 3 years ago

It would be really good to have a solution here, it doesn't need to be perfect. In the latest commit I replaced the variable GPU_ENABLED with a hidden user-controlled option, so the scheduling can simply be managed by the user like @bgruening suggested.

What is still not clear to me is if it is possible to send the job to different environments, based on the value of this option, is this the case? If not, my next idea is a new tool wrapper (probably also hidden in the UI).

simonbray commented 3 years ago

Error: The set-env command is disabled. Please upgrade to using Environment Files or opt into unsecure command execution by setting the ACTIONS_ALLOW_UNSECURE_COMMANDS environment variable to true. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/

Probably ok to set this variable?

simonbray commented 3 years ago

How does this look to you @bgruening?

bgruening commented 3 years ago

Looks good to me. Now we "just" need to configure Galaxy :)