Filimoa / open-parse

Improved file parsing for LLM’s
https://filimoa.github.io/open-parse/
MIT License
2.34k stars 89 forks source link

🚀 Roadmap #1

Open Filimoa opened 5 months ago

Filimoa commented 5 months ago

Description

This is a tentative roadmap, I will update it as things evolve.

Roadmap

High Priority:

Long Term:

shekhars-li commented 5 months ago

Hey @Filimoa do you plan to add support for unitable anytime soon? Seems like the doc mentions it but the notebook does not have an example for it. Thanks for creating this project.

Filimoa commented 5 months ago

Hey @Filimoa do you plan to add support for unitable anytime soon? Seems like the doc mentions it but the notebook does not have an example for it. Thanks for creating this project.

As soon as the pre-trained weights are released I'll be adding it. I talked with the ShengYun earlier this week and sounds like they'll be released ASAP.

shekhars-li commented 5 months ago

@Filimoa Looks like pretrained weights are available now! :)

Filimoa commented 5 months ago

In progress! Should be merged in by the end of the week.

Filimoa commented 5 months ago

Just merged - try it out, it will require downloading weights which you can read about here. We need to find a better model for table detection but this performs incredibly well otherwise.

Ulipenitz commented 5 months ago

Hey @Filimoa! Really great project!! Have you thought about using open source models for the semantic processing? You can find even better embedding models here: https://huggingface.co/spaces/mteb/leaderboard Especially this one is really promising (only 0.67GB & better than text-embedding-3-large): https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1 There are also ONNX models, running pretty fast on CPUs.

Filimoa commented 5 months ago

Added to the roadmap! Will ship very soon @Ulipenitz

cthompson-insight commented 5 months ago

Would be great to support Azure OpenAI as well.

zishengwu commented 1 month ago

Hey @Filimoa ! Have you try PaddleOCR ? As for me, this project have well performance for Layout Analysis and Table Recognition