StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
687 stars 144 forks source link

Thrust types not ABI-stable between nvcc and non-nvcc TUs #1648

Closed manopapad closed 6 months ago

manopapad commented 8 months ago

Thrust 2.03 added a nested namespace to some types, which encodes the CUDA architecture list. The problem is that now types diverge between files compiled with nvcc (which defines __CUDA_ARCH_LIST__) and gcc (which doesn't).

In particular, thrust::complex<float> becomes a different type in legion_redop.cu and legion_redop.cc, causing the latter to define each SomeLegionReduction<complex<T>>::identity using a different symbol name than what the former is looking for, eventually causing a linker error:

$ objdump -t  legion_redop.cu.o | grep identity | c++filt
[...]
0000000000000000         *UND*  0000000000000000 Legion::DiffReduction<thrust::THRUST_200300_800_NS::complex<float> >::identity
[...]
$ objdump -t legion_redop.cc.o | grep identity | c++filt
[...]
0000000000000020 g     O .rodata    0000000000000008 Legion::DiffReduction<thrust::THRUST_200300___CUDA_ARCH_LIST___NS::complex<float> >::identity
[...]

I confirmed that this is expected behavior; thrust types don't have a consistent ABI between nvcc and non-nvcc TUs. The guidance is to switch to using cuda::std:: types (from CCCL or libcu++).

manopapad commented 8 months ago

Due to this bug, building Legion on an Ubuntu 22.04 with CUDA 12.4 apt packages will fail.

https://gitlab.com/StanfordLegion/legion/-/merge_requests/1151 works around the build failure, by allowing the user to control which Thrust is used in the build (any version before 2.0.3 will do), Of course we still need to address the root problem.

manopapad commented 8 months ago

It turns out that simply replacing thrust::complex with cuda::std::complex is sufficient to fix all issues. The issue with using that is that now Legion_REDOP_COMPLEX would imply a dependency on libcudacxx 1.4.0, which only became part of the CUDA toolkit as of 11.3.

elliottslaughter commented 8 months ago

That seems reasonable to me. Our officially documented minimum is CUDA 10, but we don't test it in CI and I don't imagine anyone is actually using it. In Sapling, our oldest CUDA is currently 11.7.

lightsighter commented 8 months ago

I talked with @mpokorny today. He does indeed use the built-in complex reduction operators. I asked him to check on his CUDA versions to see if using 11.3 or later would be an issue for him.

I personally am fine with switching to using cuda::std::complex as I think it will probably be better maintained going forward and it's a more natural dependence on CUDA than on something like Thrust which has kind of been a part of CUDA but not really (even though it ships with it). The people maintaining cuda::std should be much better about their support and not making backwards breaking changes.

elliottslaughter commented 8 months ago

I agree with this approach. I think complex reductions are tricky enough that we want to "do it right" on behalf of our users.

mpokorny commented 8 months ago

Requiring CUDA version 11.3 or later would be fine with me.

elliottslaughter commented 6 months ago

I think this is fixed now?

manopapad commented 6 months ago

Yes indeed, fixed with https://gitlab.com/StanfordLegion/legion/-/merge_requests/1157