Running yalla without CUDA

axelalmet commented 4 years ago

Hi all,

I was recommended yalla by a work colleague and really like the look of it!

However, while trying to get yalla code to run, I realised that I cannot run any of the code, because my work computer and laptop are both Apple, which use AMD GPUs. Therefore, installing CUDA to run the code doesn't work for my computer and it also means that I don't have access to the thrust library. Are there any alternative methods to running this code? I also tried using a translator like coriander, but it doesn't recognise types like float3, and it still doesn't solve the problem that I need to have thrust installed. Any advice or suggestions on how to fix this would be greatly appreciated. Thank you for your help!

Best wishes, Axel.

antmatyjajo commented 4 years ago

Hi there,

I was recommended yalla by a work colleague and really like the look of it!

Great to hear you're interested in the project! Thanks for getting in touch.

I cannot run any of the code, because my work computer and laptop are both Apple, which use AMD GPUs

Yes - currently yalla has a hard dependency on CUDA. Personally, I think it would be great to eventually have a cross-platform version that runs on non-nvidia hardware. Unfortunately I haven't had the time to dedicate to this.

Are there any alternative methods to running this code?

I would recommend that you try AMD's hipify tool first, I've heard good things from colleagues: https://github.com/ROCm-Developer-Tools/HIP

It could be a good place to start from. Some CUDA features are missing, but the core is there. The only obvious potential sticking point that I can see is that hip doesn't support dynamic parallelism but as far as I know this shouldn't be an issue for yalla.

I don't have access to the thrust library

One option could be the ROCm (AMD) backend for thrust here: https://github.com/ROCmSoftwarePlatform/rocThrust

Difficult to tell how complete it is, but again could be a good start.

I also tried using a translator like coriander [...] Any advice or suggestions on how to fix this would be greatly appreciated

The hip tools at least look like they're under active development.

From briefly looking over the docs, you'll require the ROCm platform (because of the thrust dependency), rather than generating openCL compatible code which is unfortunate. On the other hand, tools like coriander that try to convert to openCL will be problematic in any case - we rely a lot on heavily templated c++, and this type of code is tricky to convert automatically into openCL as far as I know.

My advice would be to try the hipify tools above and see how you get on, let's move on from there. What do you think?

Cheers,

germannp commented 4 years ago

Hello Axel,

Glad you like your work and found your way here too :-)

Unfortunately, I am not aware of ways to run the current code without CUDA and a NVidia graphics card neither. Actually, you most likely even need Linux: Apple is doing everything to keep CUDA away from macOS and on Windows CUDA I did not yet get ya||a to work (see #6).

Regarding porting the code to openCL or HIP, the float3 problem should be easy to fix using dtypes.cuh, possibly MAKE_PT(float3); already does the trick. However, that the translator does not recognize this very basic CUDA data type does not make this road look very promising ... More promising would be to manually port the code, which leaves the Thrust problem. We use Thrust for fill, sort and reduce, which should be straight-forward to implement and protentially easier, than getting the stuff to work with ROCm and CUDA at the same time ...

However, porting the code is quite some work for a first look and using CUDA with Thrust is likely still the easiest way to program GPUs. Also, rewriting code without being able to run it, would be quite a challenge ;-) So I would suggest you try to get your hands on an NVidia GPU. I assume it is easy to get to run on AWS, but I don't know if you could run something like ParaView from there, to check the results. @antmatyjajo do you know a simple way? Maybe there is some deep learning infrastructure at you institute, where you could play around a bit? Also, a Jetson might be an option, if you don't need to work with huge models (wanted to try that for a while, as my MacBook with NVidia GPU is getting more useless for CUDA with every update) ...

Cheers, Philipp

axelalmet commented 4 years ago

Hi @antmatyjajo and @germannp,

Thanks very much for your responses. Just to let you know, I have been in the process of getting hipify installed and working on my work desktop's virtual Linux machine, to see if this is a viable solution for running the yalla code. I will keep you updated to see if this works for me, because if so, then this is perfectly good with me for the time being :).

Best wishes, Axel.

germannp / yalla

Running yalla without CUDA #19