DS4SD / docling

🥚 Transform PDF to JSON or Markdown with ease and speed 🐣
MIT License
196 stars 18 forks source link

Segfault when running from `poetry` #9

Closed dolfim-ibm closed 6 days ago

dolfim-ibm commented 1 month ago

On a fresh system, the following fails

poetry install
poetry run python example/convert.py

[1]    27598 killed     poetry run python

It seems to be caused by cv2, an easy way to reproduce it is

❯ poetry run python -c "import cv2"

[1]    27598 killed     poetry run python -c "import cv2"

Workaround

Just removing opencv and reinstalling seems to magically fix it.

poetry run pip uninstall opencv_python
poetry run pip install opencv_python

# run again 
poetry run python -c "import cv2"  # it works!

System tested

dolfim-ibm commented 1 month ago

If users simply run pip install docling, there is no problem.

beetter commented 2 weeks ago

在新系统上,以下操作失败

poetry install
poetry run python example/convert.py

[1]    27598 killed     poetry run python

这似乎是由以下原因引起的cv2,一个简单的重现方法是

❯ poetry run python -c "import cv2"

[1]    27598 killed     poetry run python -c "import cv2"

解决方法

只需删除 opencv 并重新安装似乎就能神奇地修复它。

poetry run pip uninstall opencv_python
poetry run pip install opencv_python

# run again 
poetry run python -c "import cv2"  # it works!

系统已测试

  • MacBook Pro M3 最大
  • macOS 14.5 (23F79)
  • Python 3.12 和 Python 3.11(通过 安装brew
  • Docling 0.2.0 和 0.3.0 已测试

MacBook Pro M1 16G+512G Python 3.11 It doesn't work. Is the configuration too small

dolfim-ibm commented 2 weeks ago

@beetter what is exactly your problem?

We confirm this is still a valid issue, but only in "development mode", i.e. when you clone the repo. If you simply want to use Docling, you can just run pip install docling.

beetter commented 2 weeks ago

@beetter what is exactly your problem?

We confirm this is still a valid issue, but only in "development mode", i.e. when you clone the repo. If you simply want to use Docling, you can just run pip install docling.

My computer configuration is: MacBook Pro M1 16G+512G The operating environment is as follows: Python 3.11 hj1 hj2 hj3

The error message is as follows: The program has terminated er

dolfim-ibm commented 2 weeks ago

And how did you install the package?

beetter commented 2 weeks ago

And how did you install the package?

install with:pip install docling

cau-git commented 2 weeks ago

@beetter you have both opencv-python-headless and opencv-python installed in the same environment, which will break. Please start from a clean python virtual environment again and reinstall docling. Since you are on macOS, please ensure that no other package installs opencv-python-headless into the same environment.

beetter commented 2 weeks ago

@beetter您已将两者opencv-python-headless安装opencv-python在同一环境中,这将导致问题。请再次从干净的 Python 虚拟环境启动并重新安装 docling。由于您使用的是 macOS,请确保没有其他软件包安装opencv-python-headless到同一环境中。

When I delete opencv-python, I get this error: er2

beetter commented 2 weeks ago

@beetter you have both opencv-python-headless and opencv-python installed in the same environment, which will break. Please start from a clean python virtual environment again and reinstall docling. Since you are on macOS, please ensure that no other package installs opencv-python-headless into the same environment.

Also, the easyocr package is missing, so when I install with pip install easyocr, the opencv_python_headless package is installed automatically er3

cau-git commented 2 weeks ago

@beetter thanks for the feedback, I see your point.

I reproduced the setup with a fresh venv and end up with both opencv-python and opencv-python-headless installed, but I don't find any trouble running the minimal.py example - it works.

Could you please check if the steps below work for you? Many thanks.

python3.11 -m venv venv
source venv/bin/activate
pip install "docling[ocr]==1.6.0"

# now run minimal.py example again
cau-git commented 2 weeks ago

We are taking steps to align docling to use only opencv-python-headless, an updated release will follow shortly.

cau-git commented 2 weeks ago

@beetter We released docling 1.6.2 with several fixes for dependency alignment. Please re-check your case with docling==1.6.2, it should work better out-of-the-box and install easyocr without extra steps.

beetter commented 2 weeks ago

@beetter We released docling 1.6.2 with several fixes for dependency alignment. Please re-check your case with docling==1.6.2, it should work better out-of-the-box and install easyocr without extra steps.

Thank you, this method is useful to me, I have successfully run this project, thank you very much!

cau-git commented 6 days ago

This appears to be resolved now.