issues
search
VikParuchuri
/
marker
Convert PDF to markdown quickly with high accuracy
https://www.datalab.to
GNU General Public License v3.0
14.66k
stars
764
forks
source link
Very early commercial marker preview
#107
Closed
VikParuchuri
closed
2 months ago
VikParuchuri
commented
2 months ago
Essentially a rewrite of marker:
Swap layoutlm for new layout model from surya
Add reading order model
Add text detection and surya OCR model
Redo heuristics for OCR
Remove some system dependencies (now optional, only needed for tesseract OCR)
Improve table handling
Remove pymupdf in favor of pypdfium/pdftext
Essentially a rewrite of marker: