Improvements to pipeline for the extraction of visuals - Githubissues

AiDAPT-A / VisArchPy

pipelines for the extraction and processing of visuals from PDFs

https://visarchpy.readthedocs.io

MIT License

3 stars 1 forks source link

Improvements to pipeline for the extraction of visuals #41

Closed manuGil closed 1 year ago

manuGil commented 1 year ago

layout analysis with PDFMiner.six
OCR analysis with Tesseract
Combine layout and OCR analyses to improve vector-based visual extraction
Update package metadata and installation instructions