nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

Other

6.47k stars 1.83k forks source link

I'm building the CUDA samples for multiple architectures, since it is documented one can do this with the SMS option. My build command is:

make  -j 72 HOST_COMPILER=g++ SMS='80 86'

I've encountered the issue with both Cuda-Samples 11.3, and 12.2. The issue is present in at least two samples: memMapIPCDrv and ptxjit. It is in this line and this line of their respective makefiles, which both read (with some context):

$(PTX_FILE): memMapIpc_kernel.cu
    $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
    $(EXEC) mkdir -p data
    $(EXEC) cp -f $@ ./data
    $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
    $(EXEC) cp -f $@ ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

I believe what should be done is store the GENCODE_FLAGS for PTX file generation separately. I.e this line should probably read:

# Generate PTX code from the highest SM architecture in $(SMS) to guarantee forward-compatibility
HIGHEST_SM := $(lastword $(sort $(SMS)))
ifneq ($(HIGHEST_SM),)
GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
endif
endif

And then the offending section modified to:

$(PTX_FILE): memMapIpc_kernel.cu
    $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
    $(EXEC) mkdir -p data
    $(EXEC) cp -f $@ ./data
    $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
    $(EXEC) cp -f $@ ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

I can at least confirm that with this diff:

$ cat *.patch
diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile        2024-07-29 12:14:28.538848000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile     2024-07-29 12:17:02.812364739 +0200
@@ -312,6 +312,7 @@
 HIGHEST_SM := $(lastword $(sort $(SMS)))
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
+GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
 endif

@@ -394,7 +395,7 @@
 endif

 $(PTX_FILE): memMapIpc_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)
diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile
--- cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile      2024-07-29 12:14:28.546771000 +0200
+++ cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile   2024-07-29 12:15:47.089354181 +0200
@@ -306,6 +306,7 @@
 HIGHEST_SM := $(lastword $(sort $(SMS)))
 ifneq ($(HIGHEST_SM),)
 GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
+GENCODE_FLAGS_HIGHEST_SM = -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM)
 endif
 endif

@@ -390,7 +391,7 @@
 endif

 $(PTX_FILE): ptxjit_kernel.cu
-       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $<
+       $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_HIGHEST_SM) -o $@ -ptx $<
        $(EXEC) mkdir -p data
        $(EXEC) cp -f $@ ./data
        $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

On top of the CUDA-Samples 12.2 sources, it builds correctly for multiple architectures. However, what I'm not 100% sure of, is if this makes sense, so I'm hoping someone else can confirm that :)

$ memMapIPCDrv > findModulePath found file at <./memMapIpc_kernel64.ptx> > initCUDA loading module: <./memMapIpc_kernel64.ptx> > findModulePath found file at <./memMapIpc_kernel64.ptx> > findModulePath found file at <./memMapIpc_kernel64.ptx> > initCUDA loading module: <./memMapIpc_kernel64.ptx> > initCUDA loading module: <./memMapIpc_kernel64.ptx> > findModulePath found file at <./memMapIpc_kernel64.ptx> > initCUDA loading module: <./memMapIpc_kernel64.ptx> checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292. checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292. checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292. checkCudaErrors() Driver API error = 0218 "a PTX JIT compilation failed" from file <memMapIpc.cpp>, line 292. Process 0 failed!

diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile --- cuda-samples-12.2.orig/Samples/3_CUDA_Features/memMapIPCDrv/Makefile 2024-07-29 12:14:28.538848000 +0200 +++ cuda-samples-12.2/Samples/3_CUDA_Features/memMapIPCDrv/Makefile 2024-07-29 13:02:45.134261829 +0200 @@ -313,6 +313,12 @@ ifneq ($(HIGHEST_SM),) GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM) endif + +# Generate the explicit PTX file for the lowest SM architecture in $(SMS), so it works on all SMS listed there +LOWEST_SM := $(firstword $(sort $(SMS))) +ifneq ($(LOWEST_SM),) +GENCODE_FLAGS_LOWEST_SM += -gencode arch=compute_$(LOWEST_SM),code=compute_$(LOWEST_SM) +endif endif ifeq ($(TARGET_OS),darwin) @@ -394,7 +400,7 @@ endif $(PTX_FILE): memMapIpc_kernel.cu - $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $< + $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_LOWEST_SM) -o $@ -ptx $< $(EXEC) mkdir -p data $(EXEC) cp -f $@ ./data $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE) diff -Nru cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile --- cuda-samples-12.2.orig/Samples/3_CUDA_Features/ptxjit/Makefile 2024-07-29 12:14:28.546771000 +0200 +++ cuda-samples-12.2/Samples/3_CUDA_Features/ptxjit/Makefile 2024-07-29 13:02:38.741961008 +0200 @@ -307,6 +307,12 @@ ifneq ($(HIGHEST_SM),) GENCODE_FLAGS += -gencode arch=compute_$(HIGHEST_SM),code=compute_$(HIGHEST_SM) endif + +# Generate the explicit PTX file for the lowest SM architecture in $(SMS), so it works on all SMS listed there +LOWEST_SM := $(firstword $(sort $(SMS))) +ifneq ($(LOWEST_SM),) +GENCODE_FLAGS_LOWEST_SM += -gencode arch=compute_$(LOWEST_SM),code=compute_$(LOWEST_SM) +endif endif ifeq ($(TARGET_OS),darwin) @@ -390,7 +396,7 @@ endif $(PTX_FILE): ptxjit_kernel.cu - $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS) -o $@ -ptx $< + $(EXEC) $(NVCC) $(INCLUDES) $(ALL_CCFLAGS) $(GENCODE_FLAGS_LOWEST_SM) -o $@ -ptx $< $(EXEC) mkdir -p data $(EXEC) cp -f $@ ./data $(EXEC) mkdir -p ../../../bin/$(TARGET_ARCH)/$(TARGET_OS)/$(BUILD_TYPE)

NVIDIA / cuda-samples

nvcc fatal: Option '--ptx (-ptx)' is not allowed when compiling for multiple GPU architectures #289