JLBLine / WODEN

CUDA code designed to simulate interferometric data using point, Gaussian, and shapelet sky models
https://woden.readthedocs.io/en/latest/
Mozilla Public License 2.0
5 stars 0 forks source link

Migration to Setonix #52

Closed marcinsokolowski closed 1 month ago

marcinsokolowski commented 2 months ago

Hi Jack, this is Marcin Sokolowski. I hope you are doing well. Could you add me as a contributor to WODEN. I've created a branch main_gpu_setonix with cuda functions replaced by gpu (cudaMemcpy -> gpuMemcpy) with macro .h file. This version can be compiled on both NVIDIA and SETONIX. Would like to push it as a branch main_gpu_setonix, but do not have permissions. Not sure how else to contact you about this. Once I push. I will create pull request so that you can inspect. The version compiles on Setonix and CMakeLists.txt may just need a few tweaks to compile on both CUDA and HIP. Thanks. Marcin

JLBLine commented 2 months ago

Hey @marcinsokolowski , good to hear from you! @d3v-null had been thinking about trying to do a HIP to get working on both NVIDIA and AMD. Is it really as simple as chucking in a macro somewhere? That's awesome if true. I'll just invite you as a contributor now.

marcinsokolowski commented 2 months ago

Great, thanks! You can see the differences in https://github.com/JLBLine/WODEN/tree/main_gpu_setonix/. There are a couple of things: include/gpu_macros.h gpu_fft.hpp src/ : rename .cu -> .cpp - this is required by HIP, and there is a way to change CMakeLists.txt it compile on both CUDA and HIP. I think, the current version is not far from it, but I haven't tested back on NVIDIA yet ...

CMakeLists.txt - for now this is only for SETONIX in this branch. I'll try to do the same as here: https://github.com/PaCER-BLINK-Project/imager/blob/main/CMakeLists.txt to make a single CMakeLists.txt for both platforms. Will try to test it back on NVIDIA in a free moment.

./build.sh - to build on Setonix. There are still a couple of hardcoded paths to things like mwalib and hyperbeam which Dev has modules for, but I just compiled them manually (same applies to CMakeLists.txt)

Current status, code is currently compilable on both platforms (macros gpuFunction expending to cudaFunction or hipFunction depending on compilers), but build.sh / CMakeLists.txt require a bit of work ...

marcinsokolowski commented 2 months ago

Ok, just pushed small modifications in CMakeLists.txt which makes it compile on both Setonix (./build.sh script) and Nvidia (example in ./build_nvidia.sh script, should really be enough : cmake .. -DUSE_CUDA=ON , and just make, the extra paths may just be needed on my laptop etc).

JLBLine commented 2 months ago

Very interesting. I'll have a quick look when I get a bit of free time, some time before end of week.

JLBLine commented 2 months ago

Hey @marcinsokolowski, just a heads up I'm going to work on this actively tomorrow. I've just pushed a bunch of stuff to master so your branch cant auto-merge anymore, apologies. But I'm keen to test out what you've done, and if it works, I'll try and replicate on the current state of master. Can I ask - the macros and such that you got from the PaCER-BLINK project, are they open source and/or did you write them? If all pans out and we use them, I'd like to apportion correct credit

JLBLine commented 2 months ago

Given the number of changes I made to master, it was cleaner to implement your marcros in a new branch, based off of the latest release v0.2.1. So I've created the branch compile_both_CUDA_and_HIP, which I've successfully built and run on an NVIDIA GPU. I've also updated the test suite, which works in CUDA mode now as well. Once Pawsey is back online, I'll try and make it run on Setonix as well.

marcinsokolowski commented 2 months ago

Hey @JLBLine , sounds good. As to macros we developed them for the PaCER BLINK project. It was a join work by myself and Cristian Di Pietrantonio. The BLINK imager is currently public and available here: https://github.com/orgs/PaCER-BLINK-Project/repositories , The macros are in astroio project (https://github.com/PaCER-BLINK-Project/astroio/tree/master/src), but I may have used a subset of those or added some others. I find it easier to handle to have these macros "attached" to each project rather than in some "common library", which is rather more painful than useful. You can credit Cristian and myself by link to astroio library (https://github.com/PaCER-BLINK-Project/astroio) and/or citation to the paper: https://ui.adsabs.harvard.edu/abs/2024arXiv240513478S/abstract (hopefully soon on PASA page too - accepted long ago). It's up to you, I am glad our work on PaCER Imager was useful to help in transitioning other projects too, and I leave it to you how you credit etc.

JLBLine commented 2 months ago

Excellent, thanks @marcinsokolowski . I'll link all the above on the home page and documentation. I've just got the compile_both_CUDA_and_HIP branch to compile and run on Setonix, so nearly there. I've added a script in templates/install_woden_setonix.sh with all my installation steps. It includes installing hyperbeam and getting copies of the FEE beam h5f iles so you'll want to edit it if you use it. Out of interest, were you planning on using WODEN, or just virtuously making it work with HIP?

marcinsokolowski commented 2 months ago

Sounds like you've got it under good control. I thought it may be a good way to try to get familiar with it and try to use it at some point too. However, I did not really have a clear plan in my head yet. I also wanted to help you guys a bit, plus further increase my experience with various issues that can be encountered during this transition :)

JLBLine commented 1 month ago

Cool. Well it was very helpful indeed so thank you. I've actually done some modifications to gpu_marcos.h that might be helpful to you. I found the GPU errors were reporting the file and line number from within gpu_macros.h itself, were the error wrapping was happening. With the modifications here https://github.com/JLBLine/WODEN/blob/compile_both_CUDA_and_HIP/include/gpu_macros.h , you get the file and line number of where the malloc or whatever was actually called from, rather than where the check was done

marcinsokolowski commented 1 month ago

Indeed, this is a better way of doing it. Already first advantage of doing it :) - thanks for this! I will apply this modification once I get back to changing our code again.

JLBLine commented 1 month ago

Team work is the dream work Marcin! I only knew how to do that as it came out of the RTS (ah the memories). Righty-oh, this has all been merged into master now, so I'm closing this issue.