NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.2k stars 1.36k forks source link

Updating missing build dependency in pyproject.toml #1680

Open loadams opened 1 year ago

calebho commented 1 year ago

I think you need to add torch as well because torch is also a build dependency. This dep isn't so straightforward because if you want to build the CUDA extensions, you'd need to use a different index (i.e. the --index-url bit in the PyTorch installation instructions for pip + Linux + CUDA XX.YY) depending on what CUDA you're using. I think the README would also need to be updated to explain this

crcrpar commented 1 year ago

That sounds also reasonable. fwiw example commands of README have --no-build-isolation, which could make situations simpler.

loadams commented 1 year ago

@calebho - correct, just updating packaging leaves torch as a missing dependency, but there must be a way to do this without having each person modify the --index-url for their specific version, right?

calebho commented 1 year ago

@loadams Not sure; you'd have to test it out yourself. @crcrpar's comment about --no-build-isolation ignores build dependencies entirely but shifts the responsibility to users to install the correct build dependencies beforehand

Quentin-Anthony commented 9 months ago

I propose keeping packaging and removing torch in this PR. This works for me and several others across systems and environments.

I can't see the DeepSpeed build issue in https://github.com/NVIDIA/apex/pull/1680#issuecomment-1590015630 anymore, but I suspect that's an edge case for which apex can just recommend --no-build-isolation for in the README? I can add a line in the README install section to that effect if everyone's onboard.