CEED / libCEED

CEED Library: Code for Efficient Extensible Discretizations
https://libceed.org
BSD 2-Clause "Simplified" License
198 stars 46 forks source link

Additional backends #26

Closed tzanio closed 2 years ago

tzanio commented 6 years ago
jedbrown commented 5 years ago

With the announcement that OLCF Frontier will be AMD CPU/GPU, we should try to get it into our workflow. We can use HIP (an open source CUDA-like model that can compile to CUDA and ROCm) which can be almost automatically produced from CUDA (using hipify-clang) or OpenMP-5 offload as on-node programming models. Note that HIP does not currently support run-time compilation.

HIP nominally compiles to CUDA with negligible overhead, but the toolchain needs to be installed to do so.

tcew commented 5 years ago

OCCA:HIP supports run-time compilation.

jeremylt commented 5 years ago

Our OCCA backend is in serious need of a performance overhaul, so it would be great if we can also include OCCA:HIP.

tcew commented 5 years ago

See: https://github.com/libocca/occa/blob/022b76829d43cbe20b719e6d5a54c9aff8fa178c/src/modes/hip/device.cpp#L230

jedbrown commented 5 years ago

Yes, I don't think anything special needs to be done for /gpu/occa/hip versus /gpu/occa/cuda, though the OCCA backend needs attention. My comment on run-time compilation was with regard to @YohannDudouit's native CUDA implementation.

I'm also curious about observed differences in performance characteristics between the Radeon Instinct and V100.

tcew commented 5 years ago

You should follow up with Noel Chalmers. I believe he has run libP experiments with the Radeon Instinct.

jedbrown commented 5 years ago

Thanks. @noelchalmers, can you share any experiments?

noelchalmers commented 5 years ago

Hi everyone. I'll try and chip in what I know for some of the points in this thread:

jedbrown commented 5 years ago

Thanks, @noelchalmers. On run-time compilation, I don't see anything about porting NVRTC to HIP.

Are there any public clouds with Radeon Instinct (for continuous integration, etc.).

noelchalmers commented 5 years ago

I just realized that you were referring to NVRTC when you mentioned runtime compilation.

No, HIP currently doesn't support any nvrtc* API calls. I'm not aware of any plans to add these features, but I will ask around. What HIP does support is loading compiled binaries using hipModuleLoad, which is analogous to cuModuleLoad, and finding/launching kernels from that binary.

I don't know of any public clouds I can point to using MI-25 or MI-60s yet. Maybe for some CI tests you could try compiling on some Vegas in a gpueater session? Not ideal, certainly.

jedbrown commented 5 years ago

Thanks. It looks like GPU Eater doesn't support docker-machine or Kubernetes so CI integration would be custom and/or not autoscaling, but it's something, so thanks.

jedbrown commented 5 years ago

Yet another C++ layer, this one providing single source for CPU, OpenCL, and HIP/CUDA. https://github.com/illuhad/hipSYCL

jedbrown commented 5 years ago

While I still don't see it on the docs website, hiprtc was apparently merged a few months ago. https://github.com/ROCm-Developer-Tools/HIP/pull/1097 I thought we discussed this specifically at CEED3AM and @noelchalmers and Damon were not aware that it existed. Is it something we should be trying now, or is the lack of documentation indication that it's still in easter-egg mode?

jedbrown commented 2 years ago

I'll close this open-ended issue. There is an improved occa backend coming in #1043. I think at this point we can make new issues for specific backend requests.