adithya-s-k / omniparse

Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
https://docs.cognitivelab.in
GNU General Public License v3.0
3.54k stars 274 forks source link

Markdown documents not supported? #7

Open sammcj opened 1 week ago

sammcj commented 1 week ago

I went to try out Omniparse (looks great!) but when I went to upload my documents I was met with an error stating markdown documents aren't supported.

image

This really surprised me given most wikis, knowledge bases, code / library / project documentation / PKI data etc... are markdown (aka plain text).

Why would I want to parse data that's already Markdown you might ask?

qyou commented 6 days ago

So far only doc/docx/ppt/pptx/pdf file formats are supported.

adithya-s-k commented 5 days ago

As mentioned by @qyou, we only support doc/docx/ppt/pptx/pdf. All the points you mentioned are valid, @sammcj. We will integrate normal text-based and table-based file types as well.

For chunking, please indicate which of the following is your preferred chunking strategy:

Your feedback will greatly help, @sammcj and @qyou.