Closed ahendriksen closed 5 hours ago
I have attaced a trace of the compile time. It can be checked in perfetto.dev.
Turns out that a large portion of the time is spent preprocessing the CUDA fp16 and bf16 headers. It is transitively included as follows:
Yep, looks like the extended FP type headers are quite expensive, but since they are included as part of the CCCL config, they will affect each translation unit. @miscco could we consider only defining _CCCL_HAS_NVFP16
and _CCCL_HAS_NVBF16
in the CCCL config headers and leaving it up to downstream libraries and users to include the corresponding headers themselves?
yeah that would definitely be better
Is this a duplicate?
Type of Bug
Performance
Component
libcu++
Describe the bug
Including
<cuda/ptx>
takes ~800ms on my workstation.How to Reproduce
Comparing the time to compile an empty file, a file including
cuda/ptx
and a file includingcuda/std/__type_traits/integral_constant.h
(which is included fromcuda/ptx
).Expected behavior
This should not be a heavy header.
Reproduction link
No response
Operating System
Ubuntu Linux 22.04
nvidia-smi output
NA
NVCC version
Benchmark was performed using prerelease version of nvcc, but should be reproducible with any recent version.