Open jrhemstad opened 1 year ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Is your feature request related to a problem? Please describe.
As a user of CUTLASS, I would like to build a shared object library,
libA.so
, that internally uses CUTLASS function templates, including__global__
function templates.Today, CUTLASS does nothing to hide the visibility of its
__global__
function templates or any other host template functions, and by default these symbols have weak visibility. In short, this means if I link two dynamic librariesA.so
andB.so
into my application that both contain identical instantiations of a CUTLASS template, then the linker will discard one of the two instantiations and use only one of them. This can lead to disastrous and insidious issues like spurious silent failures.This issue is true of any header-only, C++ template library, but is particularly bad for CUDA C++ libraries that ship
__global__
function templates. Consider this trivial example of one of many ways things can go wrongThe following code has two TUs:
Each TU has a single function (
volta()
orpascal()
respectively) and this function queries and prints theptxVersion
of akernel<void>
usingcudaFuncGetAttributes
.These TUs are linked into a program that determines the compute capability of device 0 and invokes
volta()
orpascal()
accordingly.One would expect that invoking
volta()
would always print 70 and invokingpascal()
would print 60.However, this is not the case. As described above, the kernel template has weak linkage, and so when linking the
volta.o
andpascal.o
TUs together, the linker selects one of the instantiations ofkernel<void>
and discards the other.The end result is that the program will randomly print 60 or 70 depending on which instantiation the linker picked.
TL;DR:
Describe the solution you'd like
Luckily the solution is quite simple. Every host template function (including
__global__
functions) in CUTLAS should be annotated with__attribute__((visibility("hidden")))
.This makes the symbol hidden in any resulting dynamic library.
Additional Context
We've been bitten by this in Thrust/CUB several times over the years.
Like CUTLASS, Thrust/CUB also have the ability to allow users to customize the namespace in order to differentiate the symbols and avoid this problem. However, this solution is not robust. First of all, it requires every user to remember to customize the namespace. Secondly, it's possible for users to properly customize the namespace and still run afoul of the issues that can result.
See: