VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
14.14k stars 720 forks source link

Possible Bottleneck PyPDFium2 #155

Closed AndiKarg closed 1 month ago

AndiKarg commented 1 month ago

PyPDFium2 doesnt work for my use cases as i have a lot of pdfs with fillable fields...

Is it possible to make the PDF Library variable? From my expertise PyMUPDF looks like the best out here so maybe this can further improve your results.

I think its definetly worth a shot.

VikParuchuri commented 1 month ago

Marker v1 used pymupdf, but it is unfortunately AGPL-licensed, so I removed it for newer versions. If you use an older version of marker (use an older git commit and manually install) you can use pymupdf.

yasyf commented 1 month ago

@VikParuchuri maybe an option to use pymupdf as an optional dependency, if it's not too much work to support both codepaths?