Madgrades / madgrades-extractor

UW-Madison course and grade distribution data extraction tool.
MIT License
13 stars 4 forks source link

Explore other options for PDF parsing #24

Open thekeenant opened 7 months ago

thekeenant commented 7 months ago

e.g. https://github.com/facebookresearch/nougat https://github.com/microsoft/table-transformer

thekeenant commented 7 months ago

Amazon's textract works well out of the box for extracting tables: https://aws.amazon.com/textract/

Google's Document AI did not work so great out of the box: https://cloud.google.com/document-ai?hl=en It would require some significant training and test data (10+ of each), and still no guarantee it would work well. But the format would be very structured.