Closed simonbray closed 3 years ago
Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with _gpu. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env with conda install gmx_gpu -c simonbray.
Ok, that is cool. Against which CUDA version have you compiled it?
Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.
Please use GPU_ENABLED
for the moment. We will set this env var together with the other env here: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/files/galaxy/dynamic_rules/usegalaxy/tool_destinations.yaml#L22
Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.
Puh, this is complicated. It might be possible with a pure condor cluster, but with the Pulsar endpoints this is currently not possible I think. However, we could implement to poor-humand-scheduling and implement (on the admin side) and extended tool form, that let's you choose the GPU or CPU setting.
Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with _gpu. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env with conda install gmx_gpu -c simonbray.
Ok, that is cool. Against which CUDA version have you compiled it?
10.1.
Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.
Please use
GPU_ENABLED
for the moment. We will set this env var together with the other env here: https://github.com/usegalaxy-eu/infrastructure-playbook/blob/master/files/galaxy/dynamic_rules/usegalaxy/tool_destinations.yaml#L22
So we can create two possible envs here, one set to true and one to false?
Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.
Puh, this is complicated. It might be possible with a pure condor cluster, but with the Pulsar endpoints this is currently not possible I think. However, we could implement to poor-humand-scheduling and implement (on the admin side) and extended tool form, that let's you choose the GPU or CPU setting.
Human control is fine, at least for now. :+1:
It would be really good to have a solution here, it doesn't need to be perfect. In the latest commit I replaced the variable GPU_ENABLED
with a hidden user-controlled option, so the scheduling can simply be managed by the user like @bgruening suggested.
What is still not clear to me is if it is possible to send the job to different environments, based on the value of this option, is this the case? If not, my next idea is a new tool wrapper (probably also hidden in the UI).
Error: The
set-env
command is disabled. Please upgrade to using Environment Files or opt into unsecure command execution by setting theACTIONS_ALLOW_UNSECURE_COMMANDS
environment variable totrue
. For more information see: https://github.blog/changelog/2020-10-01-github-actions-deprecating-set-env-and-add-path-commands/
Probably ok to set this variable?
How does this look to you @bgruening?
Looks good to me. Now we "just" need to configure Galaxy :)
@gmauro @bgruening it would be good to have some feedback from you on this (only when you have time, next week is fine). There are a few different areas to think about:
Installation - I compiled GROMACS successfully on a GPU node, with all binaries suffixed with
_gpu
. There is a conda package available on my channel. To keep it simple at first, I suggest we manually install into the existing env withconda install gmx_gpu -c simonbray
.Wrapper - I'm hoping there is some variable available to Galaxy in the tool environment so it knows whether a GPU is available or not, so the wrapper can switch between the gmx and gmx_gpu commands. If not, probably there is some workaround, but I'm not sure what the best option is here.
Scheduling - this seems complicated. We have far more CPUs than GPUs, but the tool runs much faster (ca. 6x faster in my tests, but this will vary a lot) on GPUs. So ideally there should be some queuing system for the the GPUs before jobs get sent to CPUs instead, but I have no idea how to implement this. :( At least to start with, selecting job destination manually in the user preferences would be fine as well.