An internal NVIDIA report pointed out a performance issue when working with conda-forge PyTorch + OpenMM. The issue can be mitigated by using the linker script from the above upstream PR. We should have a build step that sets USE_PRIORITIZED_TEXT_FOR_LD=1 for linux-aarch64 + CUDA 12+. A further, optional step that we can take is to regenerate cmake/prioritized_text.txt at build time.
xref: https://github.com/pytorch/pytorch/pull/121975
An internal NVIDIA report pointed out a performance issue when working with conda-forge PyTorch + OpenMM. The issue can be mitigated by using the linker script from the above upstream PR. We should have a build step that sets
USE_PRIORITIZED_TEXT_FOR_LD=1
for linux-aarch64 + CUDA 12+. A further, optional step that we can take is to regeneratecmake/prioritized_text.txt
at build time.