adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
https://omniparse.cognitivelab.in/
GNU General Public License v3.0
5.71k stars 466 forks source link

Markdown documents not supported? #7

Open sammcj opened 4 months ago

sammcj commented 4 months ago

I went to try out Omniparse (looks great!) but when I went to upload my documents I was met with an error stating markdown documents aren't supported.

image

This really surprised me given most wikis, knowledge bases, code / library / project documentation / PKI data etc... are markdown (aka plain text).

Why would I want to parse data that's already Markdown you might ask?

qyou commented 4 months ago

So far only doc/docx/ppt/pptx/pdf file formats are supported.

adithya-s-k commented 4 months ago

As mentioned by @qyou, we only support doc/docx/ppt/pptx/pdf. All the points you mentioned are valid, @sammcj. We will integrate normal text-based and table-based file types as well.

For chunking, please indicate which of the following is your preferred chunking strategy:

Your feedback will greatly help, @sammcj and @qyou.

chenyang-shanghai commented 3 months ago

It's great to add support to local html files (because some web site need login first, url is not a good opiton)