Open benjamid opened 2 years ago
I'm sadly not familiar with poetry. But installing Pytorch only with CPU support works and significantly reduced the docker image size.
I did it here for another project: https://github.com/UKPLab/EasyNMT/blob/5ea48f5fb68be9e4be4b8096800e32b8ad9a45df/docker/api/cpu.dockerfile#L5
Thanks. It may not be unique to Poetry. Our main issue is that when we install the CPU only version, by resolving dependencies we still end up with the regular one because the CPU only one we have doesn't fulfill the S-BERT dependency. Did your setup meet the dependencies / version ranges listed by S-BERT these days? If so, we might be able to adjust our versions and maybe what we are doing will work.
Yes, the setup works and you get the CPU version of Pytorch. As mentioned, I'm just familiar with pip, so sadly don't know what happens with poetry there.
But maybe you can install sentence-transformers without dependencies?
So you install: 1) torch CPU 2) transformers 3) tqdm numpy scikit-learn scipy nltk sentencepiece 4) Install sentence transformers without dependencies
That's a great lead. Do you have the versions that you used for these that worked?
No specific version needed. For pytorch I used 1.8.0 as it was the most recent when I created the docker.
How would I "Install sentence transformers without dependencies"? Thanks.
I got the same problem -- I only got 2G disk available. I tried to pip install --no-cache-dir torch==1.8.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
then pip install sentence-transformers
. It seems pip then tries to upgrade torch to torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
First install the dependencies individually and then install sentence-transformers without dependencies
👍 Thanks for the reply. I give another try.
pip install --no-cache-dir torch==1.8.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install transformers tqdm numpy scikit-learn scipy nltk sentencepiece
pip install sentence-transformers
I tried this in debian 11 python 3.8.13 -- it does not seem to work.
The last step (pip install sentence-transformers
) still installs torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
. Am I doing something wrong? Thanks.
OK I got it:
pip install --no-deps sentence-transformers
I think the issue happens as pip isn't able to resolve dependencies with suffixes like '+cpu' after the version number. So, if you have a CPU only version of torch, it fails the dependency check 'torch>=1.6.0' in sentence-transformers.
2 solutions. 1) As stated above, install dependencies and then install sentence-transformers with the --no-deps flag 2) Clone the library and change the dependency to match your version. For instance, if you have the torch version '1.13.1+cpu', change the dependency to 'torch==1.13.1+cpu'
This worked for me.
# torch CPU
# ----------------------
RUN pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# sentence transformers deps
# ----------------------
RUN pip3 install transformers tqdm numpy scikit-learn scipy nltk sentencepiece
# install sentence transformers no deps
# ----------------------
RUN pip3 install --no-deps sentence-transformers
A poetry
only solution I've found, (without using pip
workarounds), is to specify the cpu wheels explicitly in the pyproject.toml:
[tool.poetry.dependencies]
...
torch = [
{url = "https://download.pytorch.org/whl/cpu/torch-2.0.1%2Bcpu-cp38-cp38-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
{url = "https://download.pytorch.org/whl/cpu/torch-2.0.1-cp38-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin'"},
]
sentence-transformers = "^2.2.2"
(To add support for Windows, find the wheel in the list here).
Yes, the setup works and you get the CPU version of Pytorch. As mentioned, I'm just familiar with pip, so sadly don't know what happens with poetry there.
But maybe you can install sentence-transformers without dependencies?
So you install:
- torch CPU
- transformers
- tqdm numpy scikit-learn scipy nltk sentencepiece
- Install sentence transformers without dependencies
I also needed pillow
This came from a slightly different context of me trying to get sentence transformers working as a dependency in a CI runner. Not exactly a pure poetry solution, but it will allow you to still use poetry outside of the docker build.
poetry export -f requirements.txt \
--output requirements.txt \
--without-hashes --with dev && \
pip install -r requirements.txt
From here you could uninstall the libraries you specifically want to exclude, i.e.
pip uninstall nvidia-cublas-11 nvidia-cuda-nvrtc-cu11 nvidia-cuda-runtime-cu11 an
For a pip only solution just add --extra-index-url
at the top of your requirements.txt
:
--extra-index-url https://download.pytorch.org/whl/cpu
sentence_transformers
some-package
some-other-package
...
I am not too familiar with poetry
and .toml
files but if there is a way to supply an extra index url, then you could test if that works.
It would be great if both this project and pytorch allowed you to pip install
something akin to a [cpu-only]
package version so you could run transfomers entirely CPU bound without installing any of the CUDA tooling or other GPU libraries. This would make the set up in pyproject.toml
for poetry
relatively painless without having to resort to --only-root
or a separate installation step. I imagine in future users will also wish to install tools for AMD, but not Nvidia, as tooling support for AMD increases in scope and users opt to buy much cheaper AMD cards over the VRAM-equivalant Nvidia offerings.
I suspect quite a few users of this library will end up wanting to instrument simple services where they occassionally generate embeddings/vectors for small samples of text, possibly surrounded by cache layers, and not have something so complex where they also need to instrument GPUs to run the workloads efficiently.
@cyrfar's solution worked for me. Saved my bacon. Thanks!
Hey!
I wanted to share a solution I found for managing CPU-only dependencies with Poetry, particularly for PyTorch and sentence-transformers. This approach ensures no unnecessary GPU-related dependencies are installed, keeping the environment and Docker images lightweight.
Here's the pyproject.toml
configuration that worked for me:
[tool.poetry]
name = "poetry-cpu-test"
version = "0.1.0"
description = ""
authors = ["vitormanita <name@email.com>"]
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.11"
torch = [
{ url = "https://download.pytorch.org/whl/cpu/torch-2.1.1%2Bcpu-cp311-cp311-linux_x86_64.whl", markers = "sys_platform == 'linux'"},
{ url = "https://download.pytorch.org/whl/cpu/torch-2.1.1-cp311-none-macosx_11_0_arm64.whl", markers = "sys_platform == 'darwin' and platform_machine == 'arm64'"}
]
numpy = "^1.26.3"
sentence-transformers = "^2.2.2"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Key points:
markers
.torch
dependency, I added sentence-transformers
using poetry add
. Poetry took care of resolving and installing all the necessary dependencies without pulling in any GPU-specific packages.poetry.lock
file for any NVIDIA or CUDA references.
This approach builds on a similar solution by @sradc (kudos for the inspiration!). It's been effective for my use case and might help others looking to maintain a clean, CPU-only environment.To use this configuration:
pyproject.toml
with the provided script.poetry lock
or poetry lock --no-update
to generate the lock file.I hope this helps others facing the same issue. Feel free to adapt the wheel URLs or system markers as required for your setup.
@vitormanita, your method works — my team did that independently of your comment literally yesterday!
But, a problem in how Poetry's requirements.txt export plugin works blocks us from exporting a requirements.txt used in building our production containers. See https://github.com/python-poetry/poetry-plugin-export/issues/183#issuecomment-1874662722 if you try this and get a "Dependency walk failed" error in Poetry 1.7.x and older.
The way I went about it was using this stackoverflow answer to add the pytorch-cpu
wheel repository to poetry
and install torch from there.
The pyproject.toml
will end up looking something like:
...
[tool.poetry.dependencies]
torch = {version = "^2.2.1+cpu", source = "pytorch"}
torchvision = {version = "^0.17.1+cpu", source = "pytorch"}
sentence-transformers = "^2.2.0"
[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"
btw @colindean I had the exact same issue with exporting to requirements.txt with the above answer as well, but with this way I don't seem to run into that 🤷
@colindean , I encountered a similar issue; this is the workaround I discovered.
In some projects we are doing, we are running into some issues with docker container sizes getting quite large. One of the main culprits is PyTorch. As we are deploying to systems that don't have GPU's, we believe could save space by using the PyTorch CPU only release. However, we have had some trouble trying to get Poetry to set up dependencies in a way where S-BERT will use PyTorch CPU rather than also installing its preferred PyTorch flavor.
Is there any advice or recipies for using Poetry to install a version of PyTorch CPU-only that S-BERT will accept to fill its dependencies (or to override the dependencies in a way that is safe)? We are also welcome to hear other methods to try to reduce image size, but PyTorch is currently our focus as it is substantially larger than other dependencies.