support linking CUDA device code across different libraries

fwyzard commented 6 years ago

A first step should be to fix the call to nvcc for linking device code. The current rule

  $(VERB_ECHO) $(SCRAM_PREFIX_COMPILER_COMMAND) "$(NVCC) $(call AdjustFlags,$1,,CUDA_LDFLAGS CUDA_FLAGS)  $? -o $@" &&\
              ($(SCRAM_PREFIX_COMPILER_COMMAND) $(NVCC) $(call AdjustFlags,$1,,CUDA_LDFLAGS CUDA_FLAGS) $? -o $@) $(endlog_$(1))

should become

  $(VERB_ECHO) $(SCRAM_PREFIX_COMPILER_COMMAND) "$(NVCC) -dlink $(call AdjustFlags,$1,,CUDA_LDFLAGS CUDA_FLAGS) --compiler-options '$(call AdjustCudaHostFlags,$1,CXXFLAGS) $(CXXSHAREDOBJECTFLAGS)' $? -o $@" &&\
              ($(SCRAM_PREFIX_COMPILER_COMMAND) $(NVCC) -dlink $(call AdjustFlags,$1,,CUDA_LDFLAGS CUDA_FLAGS) --compiler-options '$(call AdjustCudaHostFlags,$1,CXXFLAGS) $(CXXSHAREDOBJECTFLAGS)' $? -o $@) $(endlog_$(1))

where we explicitly pass -dlink, and also pass the full set of flags to the host compiler. This should go together with the changes to the cuda toolfile at https://github.com/cms-sw/cmsdist/pull/4168 .

Then, to avoid some confusion between the existing $(1)_cudaobj variable and $(1)_cudaobjs that I propose to add, I suggest to rename the former to $(1)_cudadlink (short for device link):

-$(1)_cudaobj      := $$($(1)_objdir)/$(1)_scramcuda.$(OBJEXT)
+$(1)_cudadlink    := $$($(1)_objdir)/$(1)_cudadlink.$(OBJEXT)
...
-$(1)_objs         := $$(addprefix $$($(1)_objdir)/, $$(addsuffix .$(OBJEXT),$$($(1)_files))) $$(if $(SYNTAX_ONLY),,$$($(1)_cudaobj))
+$(1)_objs         := $$(addprefix $$($(1)_objdir)/, $$(addsuffix .$(OBJEXT),$$($(1)_files))) $$(if $(SYNTAX_ONLY),,$$($(1)_cudadlink))
...
-ifneq ($(strip $($(1)_cudaobj)),)
-$($(1)_cudaobj): $$(addprefix $($(1)_objdir)/, $(addsuffix .$(OBJEXT), $($(1)_cudafiles)))
+ifneq ($(strip $($(1)_cudadlink)),)
+$($(1)_cudadlink): $$(addprefix $($(1)_objdir)/, $(addsuffix .$(OBJEXT), $($(1)_cudafiles)))

Finally, to support linking CUDA device code across different libraries, for each package with CUDA source files we need to:

define a corresponding list of device object files and a static library (I picked the suffix _nv.o because of the ELF sections they are going to have, but feel free to use anything else):

 ifneq ($(strip $$($(1)_cudafiles)),)
 $(1)_cudadlink    := $$($(1)_objdir)/$(1)_cudadlink.$(OBJEXT)
+$(1)_cudaobjs     := $$(addprefix $$($(1)_objdir)/, $$(addsuffix _nv.$(OBJEXT),$$($(1)_cudafiles)))
 endif

copy the ELF sections __nv_relfatbin, __nv_module_id and .nvFatBinSegment to the device object files:
```
$($(1)_cudaobjs): %_nv.$(OBJEXT): %.$(OBJEXT)
    objcopy -j '.nv*' -j '__nv*' $^ $@
```
link these device object files into a static library:
```
lib$(1)_nv.a:
    ar crs ... $($(1)_cudaobjs)
```
during the device link step, take into account the full dependencies from the BuildFiles, but use these _nv.a libraries instead of the standard .so libraries.

One question: do we want to keep these _nv.a files in the same place as the .so libraries, or in a separate directory ? They are only used for linking, not at run time.

cmsbuild commented 6 years ago

A new Issue was created by @fwyzard Andrea Bocci.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

smuzaffar commented 6 years ago

@fwyzard , what is $$($(1)_objdir)/$(1)_cudadlink_nv.$(OBJEXT) and do we need it in $(1)_cudaobjs? I guess these .a are not needed at runtime. So may be better to have then in some separate directory e.g cuda_device/SCRAM_ARCH ??

fwyzard commented 6 years ago

Sorry, that was a mistake, we don't need to generate that file. I have just updated the description.

A separate directory sounds fine to me. I would use a more generic name, in case we need static libraries for other stuff in the future.

fwyzard commented 6 years ago

@smuzaffar by the way, when linking a library or plugin, does SCRAM consider only the direct dependencies, or also the indirect (transitive) ones ?

For example, assuming that

libC.so depends on libB.so
libB.so depends on libA.so

when linking libC.so, does SCRAM pass the flags -lB -lA, or only -lB ?

For the CUDA device link step, we do need the first approach,i.e. passing all direct and indirect dependencies.

smuzaffar commented 6 years ago

scram pass all i.e. direct and indirect both. By the way, when you say cuda device link step then what step you are talking about? is it the creation of $(1)_cudadlink or final link step for shared lib/plugin which has .cu files too?

fwyzard commented 6 years ago

I mean the creation of the $(1)_cudadlink.o file - that should be the only place where we need the _nv.a libraries.

smuzaffar commented 6 years ago

@fwyzard , I see that you still have $$($(1)_objdir)/$(1)_cudadlink_nv.$(OBJEXT) for $(1)_cudaobjs is it correct ? if yes then how to generate this cudadlink_nv.$(OBJEXT) ?

fwyzard commented 6 years ago

Sorry, I was sure I had removed it everywhere - fixed now.

fwyzard commented 6 years ago

By the way, I was planning to build a new "patatrack" release later today or tomorrow morning - shall I wait for an update to the build rules and test them at the same time ?

smuzaffar commented 6 years ago

I need more testing, so please go ahead with the patatrack release.

fwyzard commented 6 years ago

OK, thanks for letting me know.

smuzaffar commented 6 years ago

@fwyzard , can you please check cmssw-config tag V05-07-30 (tip of master branch) along with your changes in https://github.com/cms-sw/cmsdist/pull/4168.

You need to do the following

git clone git@github.com:cms-sw/cmssw-config
create CMSSW dev area
replace dev-area/config/SCRAM with cmssw-config/SCRAM
replace dev-area/config/BuildFile.xml with cmssw-config/CMSSW_BuildFile.xml
update dev-area/config/toolbox/slc6_amd64_gcc630/tools/selected/cuda.xml to get the changes of cms-sw/cmsdist#4168
Now checkout cmssw packages with cuda code and see if scram properly build it.

fwyzard commented 6 years ago

@smuzaffar thanks - my simple test case at https://github.com/fwyzard/cmssw/tree/add_CUDA_linking_samples works fine following your instructions.

It does not use CUDA kernels inside CMSSW plugins - I will test that in a couple week's time.

fwyzard commented 6 years ago

By the way, I have updated https://github.com/cms-sw/cmsdist/pull/4168 to pick up your changes.

fwyzard commented 6 years ago

@smuzaffar I have run into a problem with the new build rules:

>> Cuda Device Link tmp/slc7_amd64_gcc700/src/RecoPixelVertexing/PixelTrackFitting/test/testEigenGPUNoFit_t/testEigenGPUNoFit_t_cudadlink.o 
/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/slc7_amd64_gcc700/external/cuda/9.2.88-gnimlf/bin/nvcc \
    -dlink \
    -L/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/tmp/BUILDROOT/84efe5fbe8f5b399087b45a7523ae6ab/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_0_pre6_Patatrack/lib/slc7_amd64_gcc700 \
    -L/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/tmp/BUILDROOT/84efe5fbe8f5b399087b45a7523ae6ab/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_0_pre6_Patatrack/external/slc7_amd64_gcc700/lib \
    -L/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/slc7_amd64_gcc700/external/cuda/9.2.88-gnimlf/lib64/stubs \
    -lcudadevrt \
    -gencode arch=compute_35,code=sm_35 \
    -gencode arch=compute_50,code=sm_50 \
    -gencode arch=compute_61,code=sm_61 \
    -O3 \
    -std=c++14 \
    --expt-relaxed-constexpr \
    --expt-extended-lambda \
    -L/data/user/fwyzard/patatrack/build/slc7_amd64_gcc700.patatrack/tmp/BUILDROOT/84efe5fbe8f5b399087b45a7523ae6ab/opt/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_0_pre6_Patatrack/static/slc7_amd64_gcc700 \
    -lRecoPixelVertexingPixelTrackFitting_nv \
    --compiler-options '-O2 \
        ... \
        -g \
        -std=c++14  \
        -fPIC   ' \
    tmp/slc7_amd64_gcc700/src/RecoPixelVertexing/PixelTrackFitting/test/testEigenGPUNoFit_t/testEigenGPUNoFit.cu.o \
    -o tmp/slc7_amd64_gcc700/src/RecoPixelVertexing/PixelTrackFitting/test/testEigenGPUNoFit_t/testEigenGPUNoFit_t_cudadlink.o

nvlink fatal   : unexpected object after cudadevrt (target: sm_35)
gmake: *** [tmp/slc7_amd64_gcc700/src/RecoPixelVertexing/PixelTrackFitting/test/testEigenGPUNoFit_t/testEigenGPUNoFit_t_cudadlink.o] Error 1

Looks like the CUDA LDFLAGS (-L... -L... -lcudadevrt) have to go after the .o files.

fwyzard commented 6 years ago

Or rather, I think we should have

.../testEigenGPUNoFit_t/testEigenGPUNoFit.cu.o
-lRecoPixelVertexingPixelTrackFitting_nv
-lcudadevrt

fwyzard commented 6 years ago

Indeed, it seems to work if -lRecoPixelVertexingPixelTrackFitting_nv comes before -lcudadevrt. E.g. if we swap the order of

$(call AdjustFlags,$1,,CUDA_LDFLAGS CUDA_FLAGS) and
$(DLINK_LIBDIR) $(call Tool_DependencyDLINK,$1) within define link_cuda_objs.

fwyzard commented 6 years ago

I've tested it locally and seems to work, you can see the changes in cms-sw/cmssw-config#66 / cms-sw/cmsdist#4179 .

smuzaffar commented 6 years ago

thanks @fwyzard for the pull request. It is merged now and will be part of today's 11h00 IB/

smuzaffar commented 6 years ago

@fwyzard , does new build rule work as expected? Can you close this issue?

fwyzard commented 6 years ago

I just got back from a few days away, I'll double check and let you know.

fwyzard commented 6 years ago

The private release built with these changes works fine, we can close this issue.

cms-sw / cmssw-config

support linking CUDA device code across different libraries #65