conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
17 stars 43 forks source link

Splitting this package in managable chunks #108

Open hmaarrfk opened 2 years ago

hmaarrfk commented 2 years ago

Comment:

This package currently requires more than 16 builds to be build manually to ensure that it completes in time on the CIs.

Step 1: No more git clone

rgommers identified that one portion of the build process that takes time is cloning the repository. In my experience, cloning the 1.5GB repo can take up to 10 min on my powerful local machine, but I feel like it can take much longer on the CIs.

To avoid cloning, we will have to list out all the submodule manually, or make the conda-forge installable dependencies.

I mostly got this working using a recursive script which should help us keep it maintained: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/109

Option 1: Split off Dependencies:

Dependency linux mac win GPU Aware PR system deps
pybind11 no https://github.com/conda-forge/pybind11-feedstock USE_SYSTEM_PYBIND11
cub no https://github.com/conda-forge/cub-feedstock
eigen no https://github.com/conda-forge/eigen-feedstock USE_SYSTEM_EIGEN_INSTALL
googletest no will not package
benchmark no https://github.com/conda-forge/benchmark-feedstock
protobuf no https://github.com/conda-forge/libprotobuf-feedstock
ios-cmake not needed since we don't target ios
NNPACK yes yes no https://github.com/conda-forge/staged-recipes/pull/19103
gloo yes yes yes https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_GLOO
pthreadpool yes yes no https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_PTHREADPOOL
FXdiv yes yes header https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_FXDIV
FP16 yes yes header https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_FP16
psimd yes yes header https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_PSIMD
zstd yes yes yes no https://github.com/conda-forge/zstd-feedstock
cpuinfo yes yes no no https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_CPUINFO
python-enum no https://github.com/conda-forge/enum34-feedstock
python-peachpy yes yes yes no https://github.com/conda-forge/staged-recipes/pull/19103
python-six yes yes yes no https://github.com/conda-forge/six-feedstock
onnx no https://github.com/conda-forge/onnx-feedstock USE_SYSTEM_ONNX
onnx-tensorrt only
sleef no https://github.com/conda-forge/sleef-feedstock USE_SYSTEM_SLEEF
ideep
oneapisrc
nccl https://github.com/conda-forge/nccl-feedstock
gemmlowp
QNNPACK yes yes https://github.com/conda-forge/staged-recipes/pull/19103
neon2sse
fbgemm yes
foxi
tbb https://github.com/conda-forge/tbb-feedstock USE_SYSTEM_TBB (deprecated)
fbjni
XNNPACK yes yes https://github.com/conda-forge/staged-recipes/pull/19103 USE_SYSTEM_XNNPACK
fmt https://github.com/conda-forge/fmt-feedstock
tensorpipe yes
cudnn_frontend
kineto
pocketfft
breakpad
flatbuffers yes yes yes no https://github.com/conda-forge/flatbuffers-feedstock
clog static static https://github.com/conda-forge/staged-recipes/pull/19103

Option 2 - step 1: Build a libpytorch package or something

By setting BUILD_PYTHON=OFF in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/112/ we then end up with the following libraries in lib and include:

Dependency linux mac win GPU Aware PR
libasmjit yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libc10 yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libfbgemm yes yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libgloo yes yes yes
libkineto yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libnnpack yes ??? https://github.com/conda-forge/staged-recipes/pull/19103
libpytorch_qnnpack yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libqnnpack yes yes https://github.com/conda-forge/staged-recipes/pull/19103
libtensorpipe yes
libtorch
libtorch_cpu
libtorch_global_deps
Header only
ATen
c10d
caffe2
libnop yes yes https://github.com/conda-forge/staged-recipes/pull/19103

Option 2 - step 2: Depend on new ATen/libpytorch package

Compilation time progress

platform python cuda main tar gh-109 system deps
linux 64 3.7 no 1h57m 1h54m
linux 64 3.8 no 2h0m 1h51m
linux 64 3.9 no 2h31m 2h2m
linux 64 3.10 no 2h26m 2h7m
linux 64 3.7 11.2 6h+ (3933/4242 309 remaining) 6h+
linux 64 3.8 11.2 6h+ (3897/4242 345 remaning) 6h+
linux 64 3.9 11.2 6h+ (3924/4242 318 remaining) 6h+ 6h+1656/1969 313 remaining
linux 64 3.10 11.2 6h+ (3962/4242 280 remaining) 6h+
osx-64 3.7 2h42m 2h39m
osx-64 3.8 3h28m 2h52m
osx-64 3.9 2h40m 2h42m
osx-64 3.10 3h2m 2h42m
osx-arm-64 3.8 1h51 1h37m
osx-arm-64 3.9 2h20m 2h10m
osx-arm-64 3.10 4h25m 2h1m

There are approximately:

isuruf commented 10 months ago

Agree with @hmaarrfk. Please have a look at https://github.com/conda-forge/pytorch-cpu-feedstock/issues/114 too.

carterbox commented 10 months ago

I already started a discussion about standardizing the archs that feedstocks target at the conda-forge.github.io repo https://github.com/conda-forge/conda-forge.github.io/issues/1901 I'd be happy to move the discussion there. I don't think the cuda-feedstock is the place for that discussion because it's not an issue with the cuda package itself, it's a discussoin about our channel policy and is more similar to whether on not packages should target special instruction sets like AVIX-512.