DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.95k stars 533 forks source link

I get an error trying to export figures #406

Closed olsihoxha closed 4 days ago

olsihoxha commented 4 days ago

Bug

When I put a pdf file through the DocumentConverter I am getting a validation error as below

pydantic_core._pydantic_core.ValidationError: 1 validation error for ImageRef
uri
  Input should be a valid URL, empty host [type=url_parsing, input_value='data:image/png;base64,iV...2WoecAAAAAASUVORK5CYII=', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/url_parsing

Steps to reproduce

I basically was running the exact code from export_figures.py and I tried some local pdf and it was failing and at the same time I tried source = "https://arxiv.org/pdf/2408.09869" # document per local path or URL from the sample of this library and still I got the same error.

Docling version

Docling version: 2.7.0
Docling Core version: 2.4.0
Docling IBM Models version: 2.0.6
Docling Parse version: 2.1.0

Python version

Python 3.12.6

dolfim-ibm commented 4 days ago

It seems this issue is showing up since the Pydantic 2.10.0 release (16h ago).

Downgrading Pydantic works

pip install "pydantic<2.10.0"

We will soon release an fix which is pinning this version. It seems we are not the only one doing it, see for example LlamaIndex https://github.com/run-llama/llama_index/issues/17016.

vagenas commented 4 days ago

@olsihoxha until pydantic fixes the issues upstream, we have temporarily constrained it below the buggy version.

To apply this fix, upgrade your docling-core:

pip install --upgrade docling-core

I am therefore closing this for now, but feel free to re-open in case the issue persists after the above-described update.