flatironinstitute / cufinufft

Nonuniform fast Fourier transforms of types 1 and 2, in 1D, 2D, and 3D, on the GPU
Other
83 stars 18 forks source link

Running on AMD #156

Open csccva opened 4 months ago

csccva commented 4 months ago

Hello,

Are you aware of any attempts to hipify this library on AMD GPUS using HIP?

Cristian

ahbarnett commented 4 months ago

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex

On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < @.***> wrote:

Hello,

Are you aware of any attempts to hipify this library on AMD GPUS using HIP?

Cristian

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/cufinufft/issues/156, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

csccva commented 4 months ago

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < @.> wrote: Hello, Are you aware of any attempts to hipify this library on AMD GPUS using HIP? Cristian — Reply to this email directly, view it on GitHub <#156>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.> -- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

Thank you for your reply. I have to use this on AMD and my only is via hipify. SInce I never used yet the library I need to ask. Is there anything in the library is very specific to CUDA that a port will require a massive rewriting?

Cristian

ahbarnett commented 4 months ago

Melody's code uses shared memory (49kB per thread block), although that only affects type-1 transforms, and the speed of global mem seems to be catching up anyway in my A6000 tests. @blackwer may have opinions about porting, who has worked on the cuda code most recently.

On Mon, Feb 19, 2024 at 9:28 AM Cristian-Vasile Achim < @.***> wrote:

I am not, although others have asked. Have a look at Discussions over at FINUFFT GitHub. There are others that may want to help. Best, Alex … <#m-4376589966925262107> On Mon, Feb 19, 2024 at 9:22 AM Cristian-Vasile Achim < @.> wrote: Hello, Are you aware of any attempts to hipify this library on AMD GPUS using HIP? Cristian — Reply to this email directly, view it on GitHub <#156 https://github.com/flatironinstitute/cufinufft/issues/156>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ https://github.com/notifications/unsubscribe-auth/ACNZRSXGN3KNPCN6U7MQHT3YUNN25AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE2DENBYGY4TSNQ . You are receiving this because you are subscribed to this thread.Message ID: @.> -- *-------------------------------------------------------------------^`^._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

Thank you for your reply. I have to use this on AMD and my only is via hipify. SInce I never used yet the library I need to ask. Is there anything in the library is very specific to CUDA that a port will require a massive rewriting?

Cristian

— Reply to this email directly, view it on GitHub https://github.com/flatironinstitute/cufinufft/issues/156#issuecomment-1952564925, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSSZB72Q74YCPOJJLTDYUNOR5AVCNFSM6AAAAABDPRV7CGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJSGU3DIOJSGU . You are receiving this because you commented.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

blackwer commented 4 months ago

Melody's code uses shared memory (49kB per thread block),

We should revisit this. This is an old constraint that isn't really present on modern hardware where you can change these limits on the fly. It's been noted before we just haven't really done anything about it.

@blackwer may have opinions about porting, who has worked on the cuda code most recently.

Porting is straightforward and requires relatively few modifications afaik. It's definitely on my list of "fun" side-projects to tackle. I could probably do it in an afternoon or two once I got a sense for the tooling (famous last words). That said I'm busy with some other projects right now so I don't really want to work on this immediately.

@csccva If you'd like to contribute... https://github.com/flatironinstitute/cufinufft/pull/116 would be a good starting point for inspiration. The code there isn't usable directly since the repo has diverged so significantly, but I doubt the requirements for the port have changed much.

Notable differences with the current code that might require some thinking:

  1. considerably less reliance on macros than the version linked
  2. cmake, rather than makefile
  3. python code is more generic than prior -- though i think will probably "just work" without intervention (also famous last words)
csccva commented 4 months ago

Thank you for reply. I can infer that there are no special cuda features used. I can not dig now the amount of shared memory available per CU (SMP). This document suggests it is 64kb. So quite ok. I can try to give it a try with hipify. We had quite good experience with this and we are trying to see as well some header only porting approach (https://github.com/cschpc/hop)

Last (stupid) question. I got to this project recommended buy someone who used it. Is it possible to use it in C codes or only python?

Cristian

blackwer commented 4 months ago

I can not dig now the amount of shared memory available per CU (SMP).

Don't worry about this. I'll deal with this later

I can try to give it a try with hipify. We had quite good experience with this and we are trying to see as well some header only porting approach

Great! Please feel free to submit a PR

Last (stupid) question. I got to this project recommended buy someone who used it. Is it possible to use it in C codes or only python?

We provide C/C++ bindings. See: https://github.com/flatironinstitute/finufft/tree/master/examples/cuda

csccva commented 4 months ago

Thank you for your replies. I will let you know how it goes.

Cristian