GerevAI / gerev

🧠 AI-powered enterprise search engine 🔎
https://app.klu.so/signup?utm_source=github_gerevai
MIT License
2.71k stars 178 forks source link

PDF Parser, GoogleDrive support for PDF, README.md minor fix #36

Closed d4yz closed 1 year ago

Roey7 commented 1 year ago

@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?

bary12 commented 1 year ago

@bary12 do we want here pdf->html->text? to know titles, bold, etc, like docx?

Yes, just for the titles.

Roey7 commented 1 year ago

@d4yz so we need pdf_to_html, and then use html_to_text, like we do for .docx