NVIDIA / cccl

CUDA Core Compute Libraries
https://nvidia.github.io/cccl/
Other
1.25k stars 159 forks source link

This library should be usable directly by a host compiler to make compatible host objects #940

Open ogiroux opened 4 years ago

ogiroux commented 4 years ago

In principle this should work, it's intended to work, but I think a lack of testing has allow this aspect to regress

brycelelbach commented 4 years ago

This adds a lot of testing overhead. What's the value-add for this?

brycelelbach commented 4 years ago

I'm not 100% clear what the ask is here. You want cuda::std:: working standalone as host? Is that right?

nvibd commented 3 years ago

Unfortunately, it doesn't work for all cases. We've found one such issue recently, see https://github.com/NVIDIA/cccl/issues/968.

In our project, we use quite a few templates for more flexibility between host & device code. But it also means it's harder to separate all device-only code cleanly from the host compilation. In one case, we include <cuda/std/atomic> for some device code which propagates to host compilation. We could embed that include into an #ifdef __CUDA_ARCH__, but that would defeat one of the purposes of using libcu++ in the first place: Requiring as few such switches as possible.

As the front page of the project says: "It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code". It really sounds like a great advantage to me and I would hope to see the compatibility be improved further! :)

gonzalobg commented 3 years ago

This fails (compile with bash bug.cpp):

#if 0 
  set -e
  g++ -std=c++14 $0 -o bug
  ./bug
  exit 0
#endif
#include <cuda/std/complex>
int main() {
   auto x = cuda::std::complex<double>{1., 1.};
   auto y = x + x;
   return 0;
}

and is a minimum reproducer of a bug that we hit during the Juelich hackathon over the past 2 weeks while porting a C++14 solid state physics app to GPUs.

This app has a C++14 template library dependency (blaze; similar to Eigen3), that causes NVCC and NVC++ to ICE, so we had to compile most of the app with g++, scoping GPU acceleration to separate TUs.

This app uses std::complex everywhere on all module APIs, but since its layout differs from that of cuda::std::complex and cuDoubleComplex, we can't interface raw memory between the parts of the app compiled with g++ or clang, and the parts compiled with nvcc/nvc++.

Workaround: add overloads to cuDoubleComplex to mock std::complex API without changing its ABI.

@brycelelbach

What's the value-add for this?

libcu++: The C++ Standard Library for Your Entire System

The value this feature adds is allowing libcu++ to interface with the system.

If libcu++ cannot be compiled by the most widely-used compilers in the system (g++ and clang++ on Linux), it cannot then be used on APIs/ABIs that must interface with the system, and therefore its usage must be scoped to the implementation-details of translation units that do not interface with the system.

I think that is a serious limitation.

The bigger libcu++ gets, the more work it will take to fix this.

brycelelbach commented 3 years ago

We'll try to prioritize this for the summer. 2.1.0 timeframe.

maddyscientist commented 3 years ago

Adding to the motivation for this: it turns out this is a big deal for QUDA as well and prevents QUDA from adopting cuda::std::complex. Both for the files that only use the host compiler (e.g., .cpp files), and also when we use g++ and nvrtc as opposed to g++ and nvcc.

jrhemstad commented 1 year ago

We want to do this, and we'll just need to figure out how to modify the lit infrastructure to compile tests with just a host compiler.