DS4SD / docling-serve

Running Docling as an API service
MIT License
14 stars 3 forks source link

Issue running on a Fedora39 nvidia GPU node. #8

Closed nerdalert closed 1 month ago

nerdalert commented 1 month ago

Getting the following on Fedora39 + CUDA. Digging into it, just tracking it here:

$ poetry run uvicorn docling_serve.app:app --reload
INFO:     Will watch for changes in these directories: ['/home/fedora/brent/docling-serve/docling_serve']
INFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
INFO:     Started reloader process [268428] using StatReload
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/_subprocess.py", line 80, in subprocess_started
    target(sockets=sockets)
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/server.py", line 65, in run
    return asyncio.run(self.serve(sockets=sockets))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/server.py", line 69, in serve
    await self._serve(sockets)
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/server.py", line 76, in _serve
    config.load()
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/config.py", line 434, in load
    self.loaded_app = import_from_string(self.app)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/uvicorn/importer.py", line 19, in import_from_string
    module = importlib.import_module(module_str)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/home/fedora/brent/docling-serve/docling_serve/app.py", line 14, in <module>
    from docling.document_converter import DocumentConverter
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/docling/document_converter.py", line 32, in <module>
    from docling.pipeline.standard_model_pipeline import StandardModelPipeline
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/docling/pipeline/standard_model_pipeline.py", line 6, in <module>
    from docling.models.table_structure_model import TableStructureModel
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/docling/models/table_structure_model.py", line 5, in <module>
    from docling_ibm_models.tableformer.data_management.tf_predictor import TFPredictor
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib/python3.12/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 12, in <module>
    import torch
  File "/home/fedora/.cache/pypoetry/virtualenvs/docling-serve-lV4CwX38-py3.12/lib64/python3.12/site-packages/torch/__init__.py", line 290, in <module>
    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
dolfim-ibm commented 1 month ago

@nerdalert I suspect you might have an issue with torch and the cuda version on your machine. In the repo we currently depend on the pypi torch package. According to the install details on https://pytorch.org/, I think that package is compiled for cuda 21.1. The 12.4 version is available using some extra-index-url=https://download.pytorch.org/whl/cu124.

Setting it up with poetry won't be trivial.

Untested. You could try installing it with

# create a virtual env
python3 -m venv venv
# activate the venv
source venv/bin/active

# install torch with the special index
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

# install this package
pip install -e .