Isaiah512 / Reviewer

extracts text and summarizes it into key points and bullet lists, uses LangChain and Google's Gemini.
MIT License
2 stars 1 forks source link

Feature Request: Extract Text from PowerPoint Files #1

Closed Isaiah512 closed 1 week ago

Isaiah512 commented 2 weeks ago

I would like to request the addition of a feature that allows the application to extract text from PowerPoint files (.ppt and .pptx), in addition to the existing PDF functionality.

FurqanHun commented 2 weeks ago

Hi, I can help with this! I've already cloned the repo and made some improvements to accept file paths from cli arguments or through user prompts (instead of the hard coded "file_path").

For .ppt support, we have two possible approaches:

  1. Try reading the .ppt file directly using python-pptx, and if it fails then prompt the user to convert it to .pptx
  2. Add automatic file conversion functionality (though this would require os specific handling like ms office on windows, libre office on linux and blah blah)

I plan to implement the first approach using python-pptx since it's simpler and more portable across different operating systems. Would you like me to proceed with it?

FurqanHun commented 2 weeks ago

Got it to work with ppt:

image

However there's one little problem with it, i used unoconv (on linux) which is depreciated and i don't wanna install unoserver (its successor)... so im trying to implement an auto converter (which tries using:

as for windows it would require powerpoint for conversion...

Isaiah512 commented 2 weeks ago

Thank you! Please go ahead with this approach

FurqanHun commented 1 week ago

Alright, I'll clean up the code and send the pr in few hrs.