SciPhi-AI / R2R

Containerized, state of the art Retrieval-Augmented Generation (RAG) system with a RESTful API
https://r2r-docs.sciphi.ai/
MIT License
3.73k stars 281 forks source link

Add support for more book formats (e.g., EPUB, AZW3, etc.) #1144

Closed AriaShishegaran closed 2 months ago

AriaShishegaran commented 2 months ago

Is your feature request related to a problem? Please describe. I'd like to request the ability for this system to ingest more book formats for the RAG solution such as EPUB and AZW3. This would enable a more diverse set of widely accepted book formats to work with the system but also for users who don't have the PDF version or simply there's no PDF version of the said book.

Describe the solution you'd like Added support for more e-book formats.

Describe alternatives you've considered Well, LLamaParse/Index supports this and EPUBs are naturally easier to parse and understand since they are just repacked HTMLs. I assume this should be a rather robust problem to solve.

shreyaspimpalgaonkar commented 2 months ago

Unstructured supports epub. I'll add that today to R2R too. AZW3 seems like an older amazon kindle extension (mobi is current one). I think supporting that will take slightly longer. Perhaps you can try using an online converter for AZW3 until then.

https://cloudconvert.com/azw3-to-pdf

shreyaspimpalgaonkar commented 2 months ago

Fixed in this: https://github.com/SciPhi-AI/R2R/pull/1157 Will merge into main today.