anakib1 / MangoTruth

Open source infrastructure for AI plagiarism detection
4 stars 0 forks source link

PDF, DOCX formatter #12

Open Silence-o0 opened 1 month ago

Silence-o0 commented 1 month ago

Develop a formatter to parse PDF and DOCX files, extract text and tables while handling complex layouts.

Note: It presumably can be implemented using two different approaches.

anakib1 commented 1 week ago

@GeorgyPetriv please write your high level thoughts here (what should be done and how)