UKPLab / sentence-transformers

State-of-the-Art Text Embeddings
https://www.sbert.net
Apache License 2.0
14.87k stars 2.44k forks source link

sentence-transformers 2.2.2 pulling in nvidia packages #2637

Open gyezheng opened 4 months ago

gyezheng commented 4 months ago

I am using sentence-transformers-2.2.2.tar.gz while it pulls the following nvidia packages

nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl

When I search them online, it shows they are under license: NVIDIA Proprietary Software. Can I freely use sentence-transformers-2.2.2.tar.gz?

Thanks!

tomaarsen commented 4 months ago

Hello!

Yes, these are requirements by the torch Python package that are needed for you to use CUDA, i.e. a GPU. You can freely use them.

Note that if you don't have a GPU, then you may want to install torch without CUDA support & then install sentence-transformers. You can use this widget and select "CPU" if that's the case. It'll save you some disk space. But, if you have a GPU, be sure to install with the CUDA support like you've been doing.

gyezheng commented 4 months ago

Thank you for your reply! We are the CPU only case. I understand from technical perspectives, we can freely use those Nvidia packages. But any idea about from commercial perspective, can we ship them within our our own commercial product? Any difference between GPU and CPU cases from commercial perspective? Thanks!

tomaarsen commented 4 months ago

If you're using the CPU only, then you won't need those CUDA packages. You can install it with:

pip install torch --index-url https://download.pytorch.org/whl/cpu
pip install sentence-transformers

(assuming that you're on Linux). And yes, torch and sentence-transformers have commercially permissive licenses, i.e. you can use these products within (paid) commercial products.

KyeMaloy97 commented 4 months ago

So at the moment, I have been running two pip commands, the first was installing a load of dependencies in a requirements.txt and then the second was installing torch with the index url CPU parameter as you mentioned above.

pip install --no-deps -r requirements.txt pip install --no-deps -r torch_requirements.txt

Maybe the order of installing sentence-transfromers in the first requirements.txt and then installing torch was pulling the 2.3.0 (with nvidia) version of torch along as well?

KyeMaloy97 commented 4 months ago

If I do pip show torch I see:

Name: torch
Version: 1.13.1+cpu
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /usr/local/lib64/python3.9/site-packages
Requires: typing-extensions
Required-by: sentence-transformers, accelerate

So not sure why/how we are getting the nvidia packages in our scans?

tomaarsen commented 4 months ago

If I do pip show torch I see:

...

That is rather odd. Perhaps you can pip show cuda... with the CUDA packages to see what they are required by? Because torch with CPU should not require CUDA.

KyeMaloy97 commented 4 months ago

If I run pip show nvidia_cublas... or pip show cuda I get no packages found... I'm not convinced we are downloading the files our scanner thinks were getting as I cannot locate them on disk at all, and in my site-packages folder I dont see anything about nvidia or any .whl files matching what our scanner is finding.

I also think if I was pulling them cuda files, the docker image would be a lot larger (its only 2.5GB ish total, think with CUDA files it would be 8GB+).

pip list gives me:

certifi               2024.2.2
charset-normalizer    3.3.2
click                 8.1.7
contourpy             1.2.1
cycler                0.12.1
eland                 8.12.1
elastic-transport     8.13.0
elasticsearch         8.13.0
filelock              3.14.0
fonttools             4.51.0
fsspec                2024.3.1
huggingface-hub       0.23.0
idna                  3.7
importlib_resources   6.4.0
joblib                1.4.2
kiwisolver            1.4.5
matplotlib            3.8.4
nltk                  3.8.1
numpy                 1.26.4
packaging             24.0
pandas                1.5.3
pillow                10.3.0
pip                   21.2.3
psutil                5.9.8
pyparsing             3.1.2
python-dateutil       2.9.0.post0
pytz                  2024.1
PyYAML                6.0.1
regex                 2024.4.28
requests              2.31.0
safetensors           0.4.3
scikit-learn          1.4.2
scipy                 1.13.0
sentence-transformers 2.2.2
setuptools            53.0.0
six                   1.16.0
tdqm                  0.0.1
threadpoolctl         3.5.0
tokenizers            0.14.1
torch                 1.13.1+cpu
torchvision           0.14.1+cpu
tqdm                  4.66.3
transformers          4.38.0
typing_extensions     4.9.0
urllib3               2.2.1
zipp                  3.18.1
KyeMaloy97 commented 4 months ago

For extra info I also installed pipdeptree and this was the output...

accelerate==0.29.3
├── huggingface-hub [required: Any, installed: 0.23.0]
│   ├── filelock [required: Any, installed: 3.14.0]
│   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
│   ├── packaging [required: >=20.9, installed: 24.0]
│   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
│   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
├── numpy [required: >=1.17, installed: 1.26.4]
├── packaging [required: >=20.0, installed: 24.0]
├── psutil [required: Any, installed: 5.9.8]
├── PyYAML [required: Any, installed: 6.0.1]
├── safetensors [required: >=0.3.1, installed: 0.4.3]
└── torch [required: >=1.10.0, installed: 1.13.1+cpu]
    └── typing_extensions [required: Any, installed: 4.9.0]
eland==8.12.1
├── elasticsearch [required: >=8.3,<9, installed: 8.13.0]
│   └── elastic-transport [required: >=8.13,<9, installed: 8.13.0]
│       ├── certifi [required: Any, installed: 2024.2.2]
│       └── urllib3 [required: >=1.26.2,<3, installed: 2.2.1]
├── matplotlib [required: >=3.6, installed: 3.8.4]
│   ├── contourpy [required: >=1.0.1, installed: 1.2.1]
│   │   └── numpy [required: >=1.20, installed: 1.26.4]
│   ├── cycler [required: >=0.10, installed: 0.12.1]
│   ├── fonttools [required: >=4.22.0, installed: 4.51.0]
│   ├── importlib_resources [required: >=3.2.0, installed: 6.4.0]
│   │   └── zipp [required: >=3.1.0, installed: 3.18.1]
│   ├── kiwisolver [required: >=1.3.1, installed: 1.4.5]
│   ├── numpy [required: >=1.21, installed: 1.26.4]
│   ├── packaging [required: >=20.0, installed: 24.0]
│   ├── pillow [required: >=8, installed: 10.3.0]
│   ├── pyparsing [required: >=2.3.1, installed: 3.1.2]
│   └── python-dateutil [required: >=2.7, installed: 2.9.0.post0]
│       └── six [required: >=1.5, installed: 1.16.0]
├── numpy [required: >=1.2.0,<2, installed: 1.26.4]
├── packaging [required: Any, installed: 24.0]
└── pandas [required: >=1.5,<2, installed: 1.5.3]
    ├── numpy [required: >=1.20.3, installed: 1.26.4]
    ├── python-dateutil [required: >=2.8.1, installed: 2.9.0.post0]
    │   └── six [required: >=1.5, installed: 1.16.0]
    └── pytz [required: >=2020.1, installed: 2024.1]
pipdeptree==2.20.0
├── packaging [required: >=23.1, installed: 24.0]
└── pip [required: >=23.1.2, installed: 24.0]
sentence-transformers==2.2.2
├── huggingface-hub [required: >=0.4.0, installed: 0.23.0]
│   ├── filelock [required: Any, installed: 3.14.0]
│   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
│   ├── packaging [required: >=20.9, installed: 24.0]
│   ├── PyYAML [required: >=5.1, installed: 6.0.1]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
│   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
├── nltk [required: Any, installed: 3.8.1]
│   ├── click [required: Any, installed: 8.1.7]
│   ├── joblib [required: Any, installed: 1.4.2]
│   ├── regex [required: >=2021.8.3, installed: 2024.4.28]
│   └── tqdm [required: Any, installed: 4.66.3]
├── numpy [required: Any, installed: 1.26.4]
├── scikit-learn [required: Any, installed: 1.4.2]
│   ├── joblib [required: >=1.2.0, installed: 1.4.2]
│   ├── numpy [required: >=1.19.5, installed: 1.26.4]
│   ├── scipy [required: >=1.6.0, installed: 1.13.0]
│   │   └── numpy [required: >=1.22.4,<2.3, installed: 1.26.4]
│   └── threadpoolctl [required: >=2.0.0, installed: 3.5.0]
├── scipy [required: Any, installed: 1.13.0]
│   └── numpy [required: >=1.22.4,<2.3, installed: 1.26.4]
├── sentencepiece [required: Any, installed: ?]
├── torch [required: >=1.6.0, installed: 1.13.1+cpu]
│   └── typing_extensions [required: Any, installed: 4.9.0]
├── torchvision [required: Any, installed: 0.14.1+cpu]
│   ├── numpy [required: Any, installed: 1.26.4]
│   ├── pillow [required: >=5.3.0,!=8.3.*, installed: 10.3.0]
│   ├── requests [required: Any, installed: 2.31.0]
│   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
│   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
│   │   ├── idna [required: >=2.5,<4, installed: 3.7]
│   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
│   ├── torch [required: ==1.13.1, installed: 1.13.1+cpu]
│   │   └── typing_extensions [required: Any, installed: 4.9.0]
│   └── typing_extensions [required: Any, installed: 4.9.0]
├── tqdm [required: Any, installed: 4.66.3]
└── transformers [required: >=4.6.0,<5.0.0, installed: 4.38.0]
    ├── filelock [required: Any, installed: 3.14.0]
    ├── huggingface-hub [required: >=0.19.3,<1.0, installed: 0.23.0]
    │   ├── filelock [required: Any, installed: 3.14.0]
    │   ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
    │   ├── packaging [required: >=20.9, installed: 24.0]
    │   ├── PyYAML [required: >=5.1, installed: 6.0.1]
    │   ├── requests [required: Any, installed: 2.31.0]
    │   │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │   │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │   │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │   │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    │   ├── tqdm [required: >=4.42.1, installed: 4.66.3]
    │   └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
    ├── numpy [required: >=1.17, installed: 1.26.4]
    ├── packaging [required: >=20.0, installed: 24.0]
    ├── PyYAML [required: >=5.1, installed: 6.0.1]
    ├── regex [required: !=2019.12.17, installed: 2024.4.28]
    ├── requests [required: Any, installed: 2.31.0]
    │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    ├── safetensors [required: >=0.4.1, installed: 0.4.3]
    ├── tokenizers [required: >=0.14,<0.19, installed: 0.14.1]
    │   └── huggingface-hub [required: >=0.16.4,<0.18, installed: 0.23.0]
    │       ├── filelock [required: Any, installed: 3.14.0]
    │       ├── fsspec [required: >=2023.5.0, installed: 2024.3.1]
    │       ├── packaging [required: >=20.9, installed: 24.0]
    │       ├── PyYAML [required: >=5.1, installed: 6.0.1]
    │       ├── requests [required: Any, installed: 2.31.0]
    │       │   ├── certifi [required: >=2017.4.17, installed: 2024.2.2]
    │       │   ├── charset-normalizer [required: >=2,<4, installed: 3.3.2]
    │       │   ├── idna [required: >=2.5,<4, installed: 3.7]
    │       │   └── urllib3 [required: >=1.21.1,<3, installed: 2.2.1]
    │       ├── tqdm [required: >=4.42.1, installed: 4.66.3]
    │       └── typing_extensions [required: >=3.7.4.3, installed: 4.9.0]
    └── tqdm [required: >=4.27, installed: 4.66.3]
setuptools==53.0.0
tdqm==0.0.1
└── tqdm [required: Any, installed: 4.66.3]
tomaarsen commented 4 months ago

I think that looks fine, then! In fact, if you increase from sentence_transformers==2.2.2 to a more recent version, then you'll actually lose the NLTK and sentencepiece dependencies. Although they're not particularly big, so I wouldn't worry about it too much.

KyeMaloy97 commented 4 months ago

Do you happen to know if theres a check I can make to just completely know if them nvidia**.whl files got installed? I had a look in /usr/bin and /usr/lib/python3.9/site-packages and didn't find anything, also running `find / -iname .whlandfind / -iname "nvidia"` returns nothing

tomaarsen commented 4 months ago

Searching for cud might also help, but other than that I'm not sure

KyeMaloy97 commented 4 months ago

I had a look and it found a load of related files, from torch, torchgen, and transformers... most of the files are like: /usr/local/lib64/python3.9/site-packages/torch/include/ATen/cuda/CUDATensorMethods.cuh and associated header files or like /usr/local/lib/python3.9/site-packages/transformers/kernels/mra/cuda_kernel.cu

I think these are just source code files from these packages tho, not the Nvidia Propriety Software

champaanand commented 3 months ago

I'm also facing the same issue, where nvidia* packages are not getting downloaded, not being used also in our product.(our application runs on windows, where the inventory report shows wheel packages). Please let us know if there is any update.

KyeMaloy97 commented 3 months ago

Are you using an OSS scanning tool such as Mend? Our issue was around Mend under the covers doing a pip download and ignoring the fact we were doing --no-deps when installing the package, so the full pip download was getting dependencies we were not getting.

champaanand commented 3 months ago

Yes, we are using mend, integrated with github repo and Mend inventory shows these nvidia packages. and our Open source approval team says not to use nvidia even though we are using it in our product. Kindly let me know how to proceed further.

tomaarsen commented 3 months ago

I'm a bit confused

where nvidia* packages are not getting downloaded, not being used also in our product.

Mend inventory shows these nvidia* packages. [...] even though we are using it in our product.

So the packages are not being downloaded, but would you like to download them or not?

In short, to use Sentence Transformers, you will have to use torch. You can install torch with GPU/CUDA support, or without it. To get GPU support, you will have to install torch with CUDA support, which means that you'll require NVIDIA CUDA-specific packages, e.g.:

pip install torch --index-url https://download.pytorch.org/whl/cu121

If you only want to run Sentence Transformers on CPU, then you don't need to install torch with CUDA, e.g.:

pip install torch --index-url https://download.pytorch.org/whl/cpu

The latter should not install NVIDIA's CUDA packages, I believe.

champaanand commented 3 months ago

We do not use nvidia packages. Our application is doesn't need this. Problem with Mend inventory as it shows nvidia packages. Open source team says don't use nvidia*.

KyeMaloy97 commented 3 months ago

If you are using it like our use case, we were installing using the CPU version which doesn't get the GPU related stuff, but mend looks at the packages installed and seems to just ignore the option (CPU specific, no dependencies etc) and just downloads everything and then sees that "ah, Sentence Transformers requires Nvidia packages" which would be right if we weren't using the CPU specific variant.

It's an issue on Mend.io than on this library though. It's how they do they're checking that causes the Nvidia packages to be detected when they aren't actually present. We are using them in a Docker image and you can tell we don't get them as we looked through the system and cant find them and the image is small, if we were pulling them the image would be 100s MBs larger than it is.