[BUG]: The CUDA SDK defines the reserved identifier __noinline__, breaking Clang and GCC interoperation

ldionne commented 11 months ago

Is this a duplicate?

[X] I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Compile-time Error

Component

Not sure

Describe the bug

I maintain libc++, the C++ Standard Library shipped with LLVM / Clang. We recently received a bug report explaining that using Clang (and libc++ in particular) with the CUDA SDK didn't work anymore, because the CUDA SDK is defining __noinline__ to __attribute__((__noinline__)) for convenience and that conflicts with libc++'s usage of __attribute__((__noinline__)).

This is both non-standard and poor practice on the CUDA SDK's side -- underscore names are reserved for the programming language implementation. It seems like this was reported in the past as https://github.com/NVIDIA/thrust/issues/1703 but I'm not certain the problem was taken seriously.

I would like to gauge whether there is interest for migrating away from that macro and restoring proper interoperability between the CUDA SDK and Clang, GCC and their standard library implementations. If you can establish a migration path away from the macro, libc++ can work around the issue in the meantime to avoid leaving users stranded. However, we would like to have a commitment from CUDA that a migration path will be created to fix the problem in the long term -- otherwise libc++ would just be bending backwards to work around arbitrary vendor's decisions forever, and that is not workable for us.

Note that while this problem is not widespread yet, it will start hitting anyone who updates to LLVM 18 because libc++ introduced new uses of __attribute__((__noinline__)). We expect this is going to become a fairly widespread problem if nothing is done.

Note: If this is not the right place to file a bug against the CUDA SDK, please let me know where I can do so. I am not a CUDA SDK user myself, but I am reaching out because I believe our two implementations working together well is important for the ecosystem.

How to Reproduce

#include <cuda/something>
#include <string>

Expected behavior

It compiles

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

ldionne commented 11 months ago

CC @jrhemstad and @gevtushenko Since I think you chimed in on the Thrust bug referenced here.

miscco commented 11 months ago

HI @ldionne, thanks for reaching out.

I fear that this is the wrong place for the issue, as we have unfortunately nothing to do with the cuda runtime header.

I have reached out internally to find out who to contact regarding this issue. We really want to keep building with libc++

ldionne commented 11 months ago

Ah, that's kind of what I expected. Thanks, I'll wait to hear back!

miscco commented 11 months ago

Hey @ldionne,

unfortunately the feedback that I got is that those identifiers will stay. From the viewpoint of the CUDA SDK it is the implementation of the CUDA programming language, so those identifiers are good to use.

As I understand it there are plans to address this on either within clang or the on the libc++ side.

philnik777 commented 11 months ago

@miscco (I realize that arguing through a third party isn't great, but I don't think I have much of a choice. Sorry.) Even if the CUDA SDK is part of the implementation, why is it so hard to make things easier for other parts of the implementation? Since this is internal to the CUDA SDK, it should be trivial to find-and-replace __noinline__ with something like __cuda_noinline__. libc++ prefixes almost all internal macros with _LIBCPP to avoid clashing with other parts of the implementation, and I'm pretty sure we'd rename stuff if they interfere with other things. We've done that a lot with the type traits.

ldionne commented 11 months ago

As I understand it there are plans to address this on either within clang or the on the libc++ side.

There would be plans to work around this issue until the CUDA SDK stops defining the identifier, yes. But since there seems to be no desire from the CUDA SDK to be compatible with other implementations (Clang and GCC, both of which allow the use of __attribute__((__noinline__))), that seriously raises the question of how hard we should try to bend backwards on our side.

Basically, we're willing to work around the issue temporarily while CUDA fixes the underlying problem to make our users lives better. But if CUDA itself doesn't care about those users, I don't think it's viable for us to start working around every wrench that might be thrown at us unilaterally. Frankly, that stance from the CUDA SDK seems hostile to other implementations and to the ecosystem as a whole.

Is there nobody from the CUDA SDK we can have a direct conversation with? It seems a bit awkward to have this here, through you.

jrhemstad commented 10 months ago

Hey @ldionne. CCCL doesn't own the header in question, but we've been aware of the problem for a while.

I've been working on making noise about this internally for a while. Lots of people out on holiday for the next week or two, but I'll report back as soon as I can.

jrhemstad commented 10 months ago

Uneventful update: I am still working on escalating this internally.

NVIDIA / cccl