LWaetzig / StudentChatbot

1 stars 0 forks source link

create own tesseract model #12

Open LWaetzig opened 9 months ago

LWaetzig commented 9 months ago

Objective

create own tesseract model using pytesseract to improve extraction from pdf files. Compair results with basic extraction using pymudf or pypdf2

Key Features

lanteanair commented 9 months ago

instead of tesseract, exploration of the viability of https://layout-parser.github.io/

will take a deeper at that next week, other features would remain unchanged

lanteanair commented 8 months ago

layoutparser base model performance is worse than standard pdf extraction exploration of own model creation via training of own detectron2 model like this: https://www.youtube.com/watch?v=puOKTFXRyr4