TuringLang / AdvancedHMC.jl

Robust, modular and efficient implementation of advanced Hamiltonian Monte Carlo algorithms
https://turinglang.org/AdvancedHMC.jl/
MIT License
237 stars 41 forks source link

Basic CUDA support #255

Closed treigerm closed 3 years ago

treigerm commented 3 years ago

This PR is mostly based on code @xukai92 wrote and this is an initial draft to merge those changes into AHMC. I have done some local sanity checks that it runs without problems. In the future we might want to look into making this implementation more generic so it works for not just CUDA but also other hardware.

TODOs

xukai92 commented 3 years ago

Thanks for the PR Tim!

Testing. What is the best way to test GPU code? Does GitHub support running automated tests on GPUs?

There is https://github.com/JuliaGPU/buildkite from JuliaGPU but I've never tried it. I don't either know if this approach is free as it requires GPU machines? It would be great if you can take a look and let me know what's required to have this set up.

Is there a way to check in the test suite whether the current machine is CUDA enabled?

CUDA.functional() (https://github.com/JuliaGPU/CUDA.jl/blob/master/src/initialization.jl#L9-L23)?

treigerm commented 3 years ago

There is https://github.com/JuliaGPU/buildkite from JuliaGPU but I've never tried it. I don't either know if this approach is free as it requires GPU machines? It would be great if you can take a look and let me know what's required to have this set up.

I will look into it!

treigerm commented 3 years ago

I added a very basic test which just checks whether the code runs without errors. Also I made a check so that the GPU tests are only run if a GPU is available.

The README at https://github.com/JuliaGPU/buildkite mentions that the JuliaGPU organization is able to provide some infrastructure to run GPU tests. I will write a message on the GPU Slack channel to ask whether we could access to it.

treigerm commented 3 years ago

Is there anything that would speak against merging this already, i.e. are there things that need to still be done? I can add a section to the README explaining how to use AHMC with CUDA if that's helpful.

xukai92 commented 3 years ago

Is there anything that would speak against merging this already, i.e. are there things that need to still be done?

No. It's not proceeded only because this PR is current as a "Draft" so I thought you are still working on it :D

I can add a section to the README explaining how to use AHMC with CUDA if that's helpful.

Maybe a very simple one is enough for now.

treigerm commented 3 years ago

No. It's not proceeded only because this PR is current as a "Draft" so I thought you are still working on it :D Ah I see that makes sense :D.

I will add a very short paragraph to the README and then make it a proper PR.

xukai92 commented 3 years ago

Can you change to to Ready for reivew once you're done? There is also some merge conflicts for Project.toml need to solve.

treigerm commented 3 years ago

Will do so!

treigerm commented 3 years ago

I'm not sure whether the test failure is related to this PR. The fact that it only happens for one of the CI runs might be an indication that it's a stochastic thing.

xukai92 commented 3 years ago

After retrigerring, all tests pass. I'm merging this PR now. Thanks for the contribution.