VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
16.8k stars 955 forks source link

Improved Table Parsing #290

Open m9e opened 1 week ago

m9e commented 1 week ago

First, impressive work. I thought I'd provide a vibe on a first run through. And I read the README and understand this is challenging. Source doc: https://investors.exeloncorp.com/static-files/2068bc1d-d1aa-49a8-b49f-d8d01ef43a14

there are a bunch of good examples here of cases both where marker crushes it, and some where it whiffs on tables.

Screenshot 2024-09-24 at 3 45 24 PM

is a good view of a particularly challenging table and it gets it so close

I also felt this was a particularly good table that is illustrative of that common format where you see horizontal and vertical headers or labels, but then the top left (0,0 if this was a grid counted down and right) is blank. Will join discord!