Open arthurbrenno opened 5 months ago
Thanks for the suggested @arthurbrenno . We'll take a look at this. I think this would have the side benefit of reducing the size of our CPU images.
Tysm! It would save us about 3gb of storage.
@arthurbrenno see here #2976
Installing torch-cpu before the unstructured libs should be of help. This will not install the nvidia gpu libs for pytorch. This is what i Have been doing to build lambda images.
Thank you, @sidatcd!
@sidatcd i have a need to accelerate the unstructured IO , can it support GPU ? if yes what are the steps to make it use GPU
Installing torch-cpu before the unstructured libs should be of help. This will not install the nvidia gpu libs for pytorch. This is what i Have been doing to build lambda images.
For anyone who uses poetry, you can accomplish this in your pyproject.toml
with these commands:
$ poetry source add --priority=explicit pytorch-cpu https://download.pytorch.org/whl/cpu
$ poetry add --source pytorch-cpu torch
The result in your pyrpoject.toml
will look like this
onnxruntime = "^1.18.1"
torch = {version = "^2.5.0+cpu", source = "pytorch-cpu"}
unstructured = {extras = ["csv", "doc", "docx", "pdf", "ppt", "pptx", "xlsx"], version = "^0.16.3"}
[[tool.poetry.source]]
name = "pytorch-cpu"
url = "https://download.pytorch.org/whl/cpu"
priority = "explicit"
Sources: https://github.com/python-poetry/poetry/issues/7685 https://github.com/python-poetry/poetry/pull/8246/commits/948f3a9b95a200525223b897beaa92c8b255a444
That side - I +1 having a CPU only unstructured option to handle this.
I've been using unstructured for a while in a 100% cpu machine. I've noticed a lot of nvidia files (+2gb) in my venv folder coming from PyTorch (possible one of unstructured's dependencies).
Can I install a cpu-only version of unstructured? Because I've been partitioning for a while and no gpu used.
Here is my requirements.in file:
Note that there's no torch on it