Closed manopapad closed 6 months ago
Due to this bug, building Legion on an Ubuntu 22.04 with CUDA 12.4 apt packages will fail.
https://gitlab.com/StanfordLegion/legion/-/merge_requests/1151 works around the build failure, by allowing the user to control which Thrust is used in the build (any version before 2.0.3 will do), Of course we still need to address the root problem.
It turns out that simply replacing thrust::complex
with cuda::std::complex
is sufficient to fix all issues. The issue with using that is that now Legion_REDOP_COMPLEX
would imply a dependency on libcudacxx
1.4.0, which only became part of the CUDA toolkit as of 11.3.
That seems reasonable to me. Our officially documented minimum is CUDA 10, but we don't test it in CI and I don't imagine anyone is actually using it. In Sapling, our oldest CUDA is currently 11.7.
I talked with @mpokorny today. He does indeed use the built-in complex reduction operators. I asked him to check on his CUDA versions to see if using 11.3 or later would be an issue for him.
I personally am fine with switching to using cuda::std::complex
as I think it will probably be better maintained going forward and it's a more natural dependence on CUDA than on something like Thrust which has kind of been a part of CUDA but not really (even though it ships with it). The people maintaining cuda::std
should be much better about their support and not making backwards breaking changes.
I agree with this approach. I think complex reductions are tricky enough that we want to "do it right" on behalf of our users.
Requiring CUDA version 11.3 or later would be fine with me.
I think this is fixed now?
Yes indeed, fixed with https://gitlab.com/StanfordLegion/legion/-/merge_requests/1157
Thrust 2.03 added a nested namespace to some types, which encodes the CUDA architecture list. The problem is that now types diverge between files compiled with
nvcc
(which defines__CUDA_ARCH_LIST__
) andgcc
(which doesn't).In particular,
thrust::complex<float>
becomes a different type inlegion_redop.cu
andlegion_redop.cc
, causing the latter to define eachSomeLegionReduction<complex<T>>::identity
using a different symbol name than what the former is looking for, eventually causing a linker error:I confirmed that this is expected behavior; thrust types don't have a consistent ABI between nvcc and non-nvcc TUs. The guidance is to switch to using
cuda::std::
types (from CCCL or libcu++).