This WIP PR implements proper dead code elimination by swapping out the lazy-loading approach for an approach of merging everything together, internalizing, then running global DCE. This removes any dead code not used by kernels as well as improves performance by making libnvvm do less useless work.
path_tracer's generated PTX is now about 3.7 kloc instead of ~20kloc.
Things left to do:
[x] Update comments and docs to reflect that we arent using lazy-loading anymore
[x] Maybe add a #[used] macro to indicate a function should not be eliminated because it might be linked against in the future.
This WIP PR implements proper dead code elimination by swapping out the lazy-loading approach for an approach of merging everything together, internalizing, then running global DCE. This removes any dead code not used by kernels as well as improves performance by making libnvvm do less useless work.
path_tracer's generated PTX is now about 3.7 kloc instead of ~20kloc.
Things left to do:
#[used]
macro to indicate a function should not be eliminated because it might be linked against in the future.