hmaarrfk commented 2 years ago

Comment:

This package currently requires more than 16 builds to be build manually to ensure that it completes in time on the CIs.

Step 1: No more git clone

rgommers identified that one portion of the build process that takes time is cloning the repository. In my experience, cloning the 1.5GB repo can take up to 10 min on my powerful local machine, but I feel like it can take much longer on the CIs.

To avoid cloning, we will have to list out all the submodule manually, or make the conda-forge installable dependencies.

I mostly got this working using a recursive script which should help us keep it maintained: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/109

Option 1: Split off Dependencies:

Dependency	linux	mac	win	GPU Aware	PR	system deps
pybind11				no	https://github.com/conda-forge/pybind11-feedstock	USE_SYSTEM_PYBIND11
cub				no	https://github.com/conda-forge/cub-feedstock
eigen				no	https://github.com/conda-forge/eigen-feedstock	USE_SYSTEM_EIGEN_INSTALL
googletest				no	will not package
benchmark				no	https://github.com/conda-forge/benchmark-feedstock
protobuf				no	https://github.com/conda-forge/libprotobuf-feedstock
ios-cmake					not needed since we don't target ios
NNPACK	yes	yes		no	https://github.com/conda-forge/staged-recipes/pull/19103
gloo	yes	yes		yes	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_GLOO
pthreadpool	yes	yes		no	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_PTHREADPOOL
FXdiv	yes	yes		header	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_FXDIV
FP16	yes	yes		header	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_FP16
psimd	yes	yes		header	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_PSIMD
zstd	yes	yes	yes	no	https://github.com/conda-forge/zstd-feedstock
cpuinfo	yes	yes	no	no	https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_CPUINFO
python-enum				no	https://github.com/conda-forge/enum34-feedstock
python-peachpy	yes	yes	yes	no	https://github.com/conda-forge/staged-recipes/pull/19103
python-six	yes	yes	yes	no	https://github.com/conda-forge/six-feedstock
onnx				no	https://github.com/conda-forge/onnx-feedstock	USE_SYSTEM_ONNX
onnx-tensorrt				only
sleef				no	https://github.com/conda-forge/sleef-feedstock	USE_SYSTEM_SLEEF
ideep
oneapisrc
nccl					https://github.com/conda-forge/nccl-feedstock
gemmlowp
QNNPACK	yes	yes			https://github.com/conda-forge/staged-recipes/pull/19103
neon2sse
fbgemm				yes
foxi
tbb					https://github.com/conda-forge/tbb-feedstock	USE_SYSTEM_TBB (deprecated)
fbjni
XNNPACK	yes	yes			https://github.com/conda-forge/staged-recipes/pull/19103	USE_SYSTEM_XNNPACK
fmt					https://github.com/conda-forge/fmt-feedstock
tensorpipe				yes
cudnn_frontend
kineto
pocketfft
breakpad
flatbuffers	yes	yes	yes	no	https://github.com/conda-forge/flatbuffers-feedstock
clog	static	static			https://github.com/conda-forge/staged-recipes/pull/19103

clog seems to be a pretty low level library that is assisted by compile time flags. I think it is best if we don't package that one as a library. It seems like it will require some serious consideration in terms of performance if we do. They typically the full source in the repository. The only problematic thing, is that each package attempts to install the static library into the library path.
QNNPACK has a build option to allow a special provision for CAFFE2's implementation of pthreadpool
- It seems to be problematic with pthreadpool on OSX.
QNNPACK likely has two different implementations, the one they vendored in ATen, and the one they vendored in third_party.
NNPACK has two different backens, one generated by python it seems, but for some reason fp16.py cannot be found, the other with psimd.

Option 2 - step 1: Build a libpytorch package or something

By setting BUILD_PYTHON=OFF in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/112/ we then end up with the following libraries in lib and include:

Dependency	linux	mac	GPU Aware	PR
libasmjit	yes	yes		https://github.com/conda-forge/staged-recipes/pull/19103
libc10	yes	yes		https://github.com/conda-forge/staged-recipes/pull/19103
libfbgemm	yes	yes	yes	https://github.com/conda-forge/staged-recipes/pull/19103
libgloo	yes	yes	yes
libkineto	yes		yes	https://github.com/conda-forge/staged-recipes/pull/19103
libnnpack	yes		???	https://github.com/conda-forge/staged-recipes/pull/19103
libpytorch_qnnpack	yes	yes		https://github.com/conda-forge/staged-recipes/pull/19103
libqnnpack	yes	yes		https://github.com/conda-forge/staged-recipes/pull/19103
libtensorpipe			yes
libtorch
libtorch_cpu
libtorch_global_deps
Header only
ATen
c10d
caffe2
libnop	yes	yes		https://github.com/conda-forge/staged-recipes/pull/19103

Option 2 - step 2: Depend on new ATen/libpytorch package

Compilation time progress

platform	python	cuda	main	tar gh-109	system deps
linux 64	3.7	no	1h57m	1h54m
linux 64	3.8	no	2h0m	1h51m
linux 64	3.9	no	2h31m	2h2m
linux 64	3.10	no	2h26m	2h7m
linux 64	3.7	11.2	6h+ (`3933/4242` 309 remaining)	6h+
linux 64	3.8	11.2	6h+ (`3897/4242` 345 remaning)	6h+
linux 64	3.9	11.2	6h+ (`3924/4242` 318 remaining)	6h+	6h+`1656/1969` 313 remaining
linux 64	3.10	11.2	6h+ (`3962/4242` 280 remaining)	6h+
osx-64	3.7		2h42m	2h39m
osx-64	3.8		3h28m	2h52m
osx-64	3.9		2h40m	2h42m
osx-64	3.10		3h2m	2h42m
osx-arm-64	3.8		1h51	1h37m
osx-arm-64	3.9		2h20m	2h10m
osx-arm-64	3.10		4h25m	2h1m

There are approximately:

3600 files to compile for cmake for the CPU builds with the standard build process
1600-1800 files to compile when using system dependencies: https://github.com/conda-forge/pytorch-cpu-feedstock/pull/111

isuruf commented 10 months ago

Agree with @hmaarrfk. Please have a look at https://github.com/conda-forge/pytorch-cpu-feedstock/issues/114 too.

carterbox commented 10 months ago

I already started a discussion about standardizing the archs that feedstocks target at the conda-forge.github.io repo https://github.com/conda-forge/conda-forge.github.io/issues/1901 I'd be happy to move the discussion there. I don't think the cuda-feedstock is the place for that discussion because it's not an issue with the cuda package itself, it's a discussoin about our channel policy and is more similar to whether on not packages should target special instruction sets like AVIX-512.

conda-forge / pytorch-cpu-feedstock

Splitting this package in managable chunks #108