conda / conda-build

Commands and tools for building conda packages
https://docs.conda.io/projects/conda-build/
Other
380 stars 421 forks source link

Parallel build support #5449

Open cjac opened 1 month ago

cjac commented 1 month ago

Checklist

What is the idea?

Conda could build packages in parallel. After an analysis of the DAG of package dependencies, leaf nodes and their hierarchy could be built in parallel. Most of my system is idle during installation of conda packages.

image

Why is this needed?

tests for rapids[1], which include installation of cudatools, dask, pandas and other ML tools take a very long time and spend a good portion of the workflow blocking on a single threaded application.

[1] https://github.com/GoogleCloudDataproc/initialization-actions/pull/1219

What should happen?

The work should be broken down into a DAG and delegated to worker threads à la make -j$(nproc)

Additional Context

I appreciate the work done on parallelizing the package downloads. I've included export CONDA_FETCH_THREADS="$(nproc)" to accelerate that portion of the workflow.

cjac commented 1 month ago

For the record, here is the command that's taking a while to run. I am running this on a rocky8 base image. I can gather metrics for the debian and ubuntu variants as well if that would help.

time conda create -n rapids-24.06 -c rapidsai -c conda-forge -c nvidia rapids=24.06 python=3.11 cuda-version=12.4

It was using more than the 15G of memory available to the n1-standard-4 machine type, and during some portions of the installation, CPU load was near 100% with the 4 processors, so I've increased the machine type to n1-standard-16.

This improves the performance of the GPU driver build script, which uses make -j$(nproc) to parallelize the nvidia kernel driver compilation process. With -j1, the build takes much more time than with -j16. I would hope that the same would be true of the conda build process, but it seems to be single-threaded.