Mitigate performance degradation on linux-aarch64 CUDA builds by incorporating the linker script from upstream

conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.

BSD 3-Clause "New" or "Revised" License

17 stars 43 forks source link

Mitigate performance degradation on linux-aarch64 CUDA builds by incorporating the linker script from upstream #264

Open leofang opened 5 days ago

leofang commented 5 days ago

xref: https://github.com/pytorch/pytorch/pull/121975

An internal NVIDIA report pointed out a performance issue when working with conda-forge PyTorch + OpenMM. The issue can be mitigated by using the linker script from the above upstream PR. We should have a build step that sets USE_PRIORITIZED_TEXT_FOR_LD=1 for linux-aarch64 + CUDA 12+. A further, optional step that we can take is to regenerate cmake/prioritized_text.txt at build time.

hmaarrfk commented 5 days ago

Yeah. I’ve been seeing the warning and wondering what it meant.

I’ve been meaning to look into it. But the aarcc stuff seems to be failing for other reasons.