Open ogiroux opened 4 years ago
This adds a lot of testing overhead. What's the value-add for this?
I'm not 100% clear what the ask is here. You want cuda::std:: working standalone as host? Is that right?
Unfortunately, it doesn't work for all cases. We've found one such issue recently, see https://github.com/NVIDIA/cccl/issues/968.
In our project, we use quite a few templates for more flexibility between host & device code. But it also means it's harder to separate all device-only code cleanly from the host compilation. In one case, we include <cuda/std/atomic>
for some device code which propagates to host compilation. We could embed that include into an #ifdef __CUDA_ARCH__
, but that would defeat one of the purposes of using libcu++ in the first place: Requiring as few such switches as possible.
As the front page of the project says: "It provides a heterogeneous implementation of the C++ Standard Library that can be used in and between CPU and GPU code". It really sounds like a great advantage to me and I would hope to see the compatibility be improved further! :)
This fails (compile with bash bug.cpp
):
#if 0
set -e
g++ -std=c++14 $0 -o bug
./bug
exit 0
#endif
#include <cuda/std/complex>
int main() {
auto x = cuda::std::complex<double>{1., 1.};
auto y = x + x;
return 0;
}
and is a minimum reproducer of a bug that we hit during the Juelich hackathon over the past 2 weeks while porting a C++14 solid state physics app to GPUs.
This app has a C++14 template library dependency (blaze; similar to Eigen3), that causes NVCC and NVC++ to ICE, so we had to compile most of the app with g++, scoping GPU acceleration to separate TUs.
This app uses std::complex
everywhere on all module APIs, but since its layout differs from that of cuda::std::complex
and cuDoubleComplex
, we can't interface raw memory between the parts of the app compiled with g++ or clang, and the parts compiled with nvcc/nvc++.
Workaround: add overloads to cuDoubleComplex
to mock std::complex
API without changing its ABI.
@brycelelbach
What's the value-add for this?
libcu++: The C++ Standard Library for Your Entire System
The value this feature adds is allowing libcu++ to interface with the system.
If libcu++ cannot be compiled by the most widely-used compilers in the system (g++ and clang++ on Linux), it cannot then be used on APIs/ABIs that must interface with the system, and therefore its usage must be scoped to the implementation-details of translation units that do not interface with the system.
I think that is a serious limitation.
The bigger libcu++ gets, the more work it will take to fix this.
We'll try to prioritize this for the summer. 2.1.0 timeframe.
Adding to the motivation for this: it turns out this is a big deal for QUDA as well and prevents QUDA from adopting cuda::std::complex
. Both for the files that only use the host compiler (e.g., .cpp files), and also when we use g++ and nvrtc as opposed to g++ and nvcc.
We want to do this, and we'll just need to figure out how to modify the lit
infrastructure to compile tests with just a host compiler.
In principle this should work, it's intended to work, but I think a lack of testing has allow this aspect to regress