UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
15.16k stars 2.47k forks source link

[Feature Request] Remove hard dependencies to install CUDA and OpenAI python packages. #2904

Closed chaudhariatul closed 2 months ago

chaudhariatul commented 2 months ago

Feature request

Request to remove hard dependencies that require CUDA and OpenAI python packages with sentence-transformers python package.

Motivation

Many python packages default to use CUDA and OpenAI and so when developing with non CUDA devices and using different embedding models there are large number of dependencies downloaded.

A docker image created with these requirements often uses upto 9GB and a local python environment uses 6GB or more.

$ pip install sentence-transformers

...
output clipped
...
Successfully installed MarkupSafe-2.1.5 Pillow-10.4.0 certifi-2024.7.4 charset-normalizer-3.3.2 filelock-3.15.4 fsspec-2024.6.1 huggingface-hub-0.24.6 idna-3.7 jinja2-3.1.4 joblib-1.4.2 mpmath-1.3.0 networkx-3.3 numpy-2.1.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 packaging-24.1 pyyaml-6.0.2 regex-2024.7.24 requests-2.32.3 safetensors-0.4.4 scikit-learn-1.5.1 scipy-1.14.1 sentence-transformers-3.0.1 sympy-1.13.2 threadpoolctl-3.5.0 tokenizers-0.19.1 torch-2.4.0 tqdm-4.66.5 transformers-4.44.2 triton-3.0.0 typing-extensions-4.12.2 urllib3-2.2.2

A reduce storage footprint will allow the following:

  1. Small size docker images
  2. Faster downloads and reduce bandwidth usage when downloading images and packages
  3. Enable faster deployments
  4. Vulnerabilities and security risks are reduced with not including packages that are not required

Your contribution

I'll test a combination of packages that can help reduce storage footprint.

Link: https://github.com/huggingface/transformers/issues/32904

tomaarsen commented 2 months ago

Hello!

Indeed, Sentence Transformers has a few direct dependencies and a good amount of indirect dependencies. For reference, these are the direct ones at this time: https://github.com/UKPLab/sentence-transformers/blob/add421f21508cd2baf4cd32af31624c63b355a1d/pyproject.toml#L32-L41

Out of all of the direct and indirect dependencies, torch and in particular its (optional dependency) CUDA use the most disk space. On Unix devices, the default torch behaviour is to install with CUDA, whereas on Windows, the default behaviour is to install without CUDA. So, if you're using a Docker image, you're likely installing torch with CUDA by default. If you're indeed using a GPU, then there's no way around this - you can't really shrink torch & CUDA. But if you're not using a GPU, then my recommendation is to first install torch without CUDA (pip install torch --index-url https://download.pytorch.org/whl/cpu), and then install sentence-transformers. During the ST installation, it will see that torch is installed, and it'll not install it with CUDA.

Additionally, usually when you install something via pip, pip will cache the whl file as well as installing the Python files for that module. On your PC/Laptop this is usually fine, but on a Docker image you might want to avoid this (as you're usually only installing with pip once and you don't need that cache). So, you can use the --no-cache-dir with the pip install to get a smaller image: https://pip.pypa.io/en/stable/cli/pip/#cmdoption-no-cache-dir.

Here's an indication of the difference in filesizes excluding the unnecessary cache (note, this is on Windows, so there's no Triton):

With CUDA: 5.3GB image

Without CUDA: 1.6GB image

So, if you're not using a GPU, then my recommendation is:

pip install --no-cache-dir -U torch --index-url https://download.pytorch.org/whl/cpu
pip install --no-cache-dir sentence-transformers

And your Docker image should shrink by a good amount. If you are using a GPU, then there's not that much that can be done other than not using a cache:

pip install --no-cache-dir sentence-transformers

And to answer your proposal more concretely: I can't make any of the direct dependencies optional (except maybe Pillow/PIL which is ~7MB) as they are all crucial to the core functionality of Sentence Transformers.

chaudhariatul commented 2 months ago

This worked! Thank you Tom!