DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.76k stars 528 forks source link

Poor Extraction of Mathematical Expressions in Document Conversion Output #212

Open jiraiya1729 opened 3 weeks ago

jiraiya1729 commented 3 weeks ago

I tried extracting data from a PDF containing the image below. image However, the result was image The output was not accurate, especially for the basic mathematical expressions.

cau-git commented 2 weeks ago

@jiraiya1729 Thanks for the report. Mathematical expressions in digital PDFs are often encoded in various obfuscated and incomplete ways, such as seen in your case. We are actively working on bringing an equation recognition model to Docling soon, to solve these issues.

csv610 commented 2 weeks ago

@jiraiya1729 Thanks for the report. Mathematical expressions in digital PDFs are often encoded in various obfuscated and incomplete ways, such as seen in your case. We are actively working on bringing an equation recognition model to Docling soon, to solve these issues.

Hi, when do you plan to address and release the modified version?

AbdulKhaliq293 commented 2 weeks ago

@cau-git Thanks for taking notice, as i am also struggling for exact same issue. Is there any workaround to this problem? Is there any workaround to this problem?