All these structures were converted into Level 2 (##)

VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy

https://www.datalab.to

GNU General Public License v3.0

16.49k stars 927 forks source link

All these structures were converted into Level 2 (##) #68

Open dtthanh1971 opened 8 months ago

dtthanh1971 commented 8 months ago

I converted a PDF file which is a book. The book has a structure with Sections (level 1), Chapters (level 2), and Headings (level 3), but by using Marker, all these structures were converted into Level 2 (##) in the Markdown format.

VikParuchuri commented 8 months ago

It doesn't differentiate between header levels right now. I'm planning to improve the detection of block types, but it's behind a few things on the roadmap

SichangHe commented 5 days ago

For some examples, Marker cannot identify the headers at all. Example (you can find the original PDF in the same repo). Looks like something to be improved. Hope the example helps!