@psychocoderHPC let us please collect all bugs that we noticed in PIConGPU with CUDA.
Currently and CUDA 6.5 specific:
In case they did all not show up before (CUDA 5.0-6.0) and are fixed in CUDA 7.0(rc) we might have to consider to either force CUDA 7.0+ or to exclude the 6.5 support at all.
Currently applied work-arounds in PIConGPU for CUDA Bugs
CUDA 5.5+
475 -(-static_cast<>(...)) in loops, forum -> upsteam bug report number? fixed in 6.0+? -> work-arounds removed (refactored to short-if's) in #680
538 #539 #2831 ? zero if not volatile bug -> needs upstream bug report!
401 unique id for (external?) __shared__ memory required -> needs upstream bug report!
199 (external?) __shared__ memory can not be created by template -> needs upstream RFE!
CUDA 6.5
570 numBins *= int( numBins > 0); bug, we found short ifs ?:; are efficient, too (no workarounds applied but refactored the code) -> nvidia bug report number 1604407 (@psychocoderHPC)
CUDA 6.5-7.0rc
655 loop optimization bug -> not (yet) reproduce able with a minimal example ->needs upstream bug report (solved in 7.0)
CUDA 6.5 & 7.0
spotted alignment issues with pointer members in structs (sort them last to fix) #762 #1579
CUDA 6.5+
constexpr function calls with constexpr is not working (@flamefire upstream bug report #1659894)
CUDA 7.0
C++11 usage of decltype in specific templates is broken, e.g., visible in boost::result_of in decltype mode (no define BOOST_RESULT_OF_USE_TR1) -> boost work-around applied #1151
alignment above 32byte seems to break on some devices with gcc #1563
CUDA 7.5+
Compilation with -G and --relaxed-constexpr breaks usage of map (@flamefire upstream bug report #1690424)
C++98 supported constant expressions, e.g., in compile time asserts, are only working with constexpr C++11 keyword needs upstream bug report #1290
CUDA 10+
templates for kernel generation outside vs. inside members can generate invalid device code #3015
Third Party Bugs with CUDA
Boost
105 #560 __noinline__ -> reported (391854 -> 1422182 @ax3l) and fixed with each of CUDA 6.5+ and boost 1.56+ (still a work around in PIConGPU until we require at least that versions, work around in pre 0.2.0 releases)
1324 Boost 1.60.0, C++11 and CUDA <7.5 (variadic templates not working)
1314 Boost 1.60.0, nvcc & C++98 is broken without work-around, apply this patch to boost or use C++11
Fixed PIConGPU bugs
no CUDA bugs but PIConGPU bugs that were only noticed in newer cuda versions (incomplete list, low priority for this collection)
401 byRef in TaskSetValue functors + unique id for external shared memory required (CUDA 5.5+) (CUDA 5.5+)
483 #484 static for functions makes them private in C++ (honored correctly in CUDA 6.5+)
@psychocoderHPC let us please collect all bugs that we noticed in PIConGPU with CUDA.
Currently and CUDA 6.5 specific: In case they did all not show up before (CUDA 5.0-6.0) and are fixed in CUDA 7.0(rc) we might have to consider to either force CUDA 7.0+ or to exclude the 6.5 support at all.
Currently applied work-arounds in PIConGPU for CUDA Bugs
CUDA 5.5+
475
-(-static_cast<>(...))
in loops, forum -> upsteam bug report number? fixed in 6.0+? -> work-arounds removed (refactored toshort-if
's) in #680538 #539 #2831 ? zero if not
volatile
bug -> needs upstream bug report!401 unique id for (external?)
__shared__
memory required -> needs upstream bug report!199 (external?)
__shared__
memory can not be created by template -> needs upstream RFE!CUDA 6.5
570
numBins *= int( numBins > 0);
bug, we foundshort ifs ?:;
are efficient, too (no workarounds applied but refactored the code) -> nvidia bug report number 1604407 (@psychocoderHPC)CUDA 6.5-7.0rc
655 loop optimization bug -> not (yet) reproduce able with a minimal example ->
needs upstream bug report(solved in 7.0)CUDA 6.5 & 7.0
CUDA 6.5+
CUDA 7.0
decltype
in specific templates is broken, e.g., visible inboost::result_of
in decltype mode (no define BOOST_RESULT_OF_USE_TR1) -> boost work-around applied #1151a
,b
andc
are reserved https://github.com/ComputationalRadiationPhysics/mallocMC/pull/108CUDA 7.0+
CUDA 7.5+
constexpr
C++11 keyword needs upstream bug report #1290CUDA 10+
Third Party Bugs with CUDA
Boost
105 #560
__noinline__
-> reported (391854 -> 1422182 @ax3l) and fixed with each of CUDA 6.5+ and boost 1.56+ (still a work around in PIConGPU until we require at least that versions, work around in pre0.2.0
releases)1324 Boost 1.60.0, C++11 and CUDA <7.5 (variadic templates not working)
1314 Boost 1.60.0, nvcc & C++98 is broken without work-around, apply this patch to boost or use C++11
Fixed PIConGPU bugs
no CUDA bugs but PIConGPU bugs that were only noticed in newer cuda versions (incomplete list, low priority for this collection)
401
byRef
inTaskSetValue
functors + unique id for external shared memory required (CUDA 5.5+) (CUDA 5.5+)483 #484 static for functions makes them private in C++ (honored correctly in CUDA 6.5+)