Open elliottslaughter opened 2 years ago
Update: I upgraded to ROCm/HIP 4.5.0 (from 4.2.0) and it resolved the issue with symbols. The remaining tests that fail seem to be some sort of a numerical issue.
This seems like a worthy goal. I do not have an AMD GPU to test on - and am frankyl out of my depth - we will need others in the dev team to test!
Your macro-based approach looks good, at least as a way to get HIP started with minimal code changes. It unifies the headers for the two architectures. (Note that you now #include fft and complex headers always where they weren't previously - hope this isn't the cause of math errors). Certainly better than a search-and-replace recommended at blogs like https://www.lumi-supercomputer.eu/preparing-codes-for-lumi-converting-cuda-applications-to-hip/ !
To complete a PR you would want to add a doc section in README.md or a linked doc file about compiling on HIP. Basically, so everyone knows how to test what you've added.
Try to make the makefile have minimal changes, and a simple flag eg make check -DHIP
Now, re math, if you can post a single test command line that fails (you say a math-output rather than crash failure?) maybe someone w/ AMD can help us out?
Thanks, Alex
I've been doing some digging and think I've found the root cause, which seems to be related to the host-side code and not to the GPU at all.
My debugging shows that we're not taking the following branch because MAX_NF
is 0.
MAX_NF
meanwhile is defined as (BIGINT)1e11
as you can see below (note this is master
, not my branch):
And BIGINT
is defined as int
, which is 32 bits in most implementations.
To be honest, I'm not sure how this has ever worked. If you compare 1e11
and 1<<32
(which is the largest you can fit in an unsigned int
, so expect about 2x smaller for signed):
>>> 1e11
100000000000.0
>>> 1<<32
4294967296
You can see that 1e11
clearly overflows. You need to get down to about 1e9
to find something that actually fits in 32 bits.
I have confirmed that changing 1e11
to 1e9
does make the test pass. So it never had anything to do with the GPU in the first place.
I think the potential solutions could be decreasing MAX_NF
to be INT_MAX
(or similar), or we could change BIGINT
to be 64 bits if you really want to be able to represent 1e11
in that number. Either way, it's not really a HIP issue so it should probably be fixed in a separate PR.
Hi @elliottslaughter , thanks for reporting this bug and the proposed solutions. This is my mistake. In FINUFFT (the cpu version), BIGINT
is defined as int64_t
and everything makes sense.
I will create a PR to fix it.
Hi Elliot, that's exciting re getting HIP going. Looks like pulling in 1d failed... sorry... but it's a worthwhile part of the code to include. Good luck & thanks for your collaboration on this. -A
I'm currently working on diagnosing the issue that happens on Crusher with newer AMD hardware. Currently (as of my most recent push), the entire test suite works on Spock (with previous-generation AMD hardware).
After AMD works in this branch, I'll get back to NVIDIA and the other issues identified in this thread.
@ahbarnett I wanted to follow up to your comment https://github.com/flatironinstitute/cufinufft/pull/134#pullrequestreview-883277905:
make check
... does not have official pass/fail detection; it's a matter of if the tests compile and run (no math check is actually done).
Do the Python tests have such checks? I ask because, so far I've been using make check
for my AMD/HIP tests, and if they're not checked, it means I haven't actually validated the correctness. If Python does, that may be a path forward though I'll have to work around the lack of PyCUDA on AMD.
Do the Python tests have such checks?
They do. I would run python3 -m pytest python/cufinufft/tests
to verify correctness.
This is work in progress branch to add support for AMD GPUs via HIP, a very nearly CUDA-compatible API by AMD. While this isn't ready to go yet, it's far enough along that I wanted to get feedback on the approach I'm taking.
The basic idea of HIP is to provide CUDA-compatible APIs that are identical except for naming. In most cases you can find-and-replace
cuda
forhip
and things will work.Rather than literally doing that transformation to the code (the way AMD seems to want you to do it), I'm using an approach with a header wrapper around
cuda.h
/hip/hip_runtime.h
. Then I add some translations with#define cudaX hipX
for eachX
API call used in the code. The advantage of going this route is that the vast majority of the code does not need to change at all, at least for a functionally correct version. I haven't looked at performance at all yet, and there may be more work to do there.I also made a few minor changes to the build system, mostly to provide enough configuration flexibility so that site files are sufficient to build against HIP. I provide an example site in
sites/make.inc.olcf_spock
for Spock.Status:
Running
make site=olcf_spock check
currently gives me failures in the following tests. All other tests are passing.pycuda
frompython/cufinufft/requirements.txt
because it doesn't exist in HIP (PyCUDA has not been ported, and I'm not sure whether it will be). I'm not sure how to deal with this at the moment since I'm not surerequirements.txt
has an optional dependency syntax.I'd appreciate a review of the basic approach I'm using, and if anyone has advice on how to address the remaining tests failures, that would be helpful. Thanks!
P.S. This may also be sufficient to get Intel GPUs working via HIPCL, though I haven't tested that.