getomni-ai / zerox

Zero shot pdf OCR with gpt-4o-mini
https://getomni.ai/
MIT License
1.38k stars 48 forks source link

Please provide Python API #2

Closed saitej123 closed 1 day ago

saitej123 commented 1 month ago

Please provide Python API

tylermaran commented 1 month ago

I'd love to support a Python API and publish a package on pip. Right now neither of the maintainers are super good python devs, but if you know anyone would would want to make a contribution let us know!

The roadmap so far is:

batmanscode commented 1 month ago

@tylermaran I've been thinking of building a version of this for myself for a while now and I was so excited to see your project on HN so that I didn't have to build it myself haha

Let me look into this. Maybe I can help out with a python package

tylermaran commented 1 month ago

Hey @batmanscode 🦇

I would love to have some help here. It looks like there is a similar pip package for pdf2image. https://github.com/Belval/pdf2image

Uses poppler under the hood. I wonder if there's a variant that uses imagemagik like the current node version does. But either way it should be pretty easy to set up. Within the npm setup we have an install-dependencies script to make sure all the prereqs are set up.

I'd like to keep this as a monorepo if possible. Probably something like:

zerox/
├── .gitignore
├── README.md
├── LICENSE
├── package.json     # npm config
├── setup.py         # pip config
├── node-zerox/      # typescript source
│   ├── src/
│   ├── dist/
│   ├── tests/
│   └── etc/
└── py-zerox/        # python source
    ├── src/
    ├── tests/
    └── etc/            
wizenheimer commented 1 month ago

Hey @tylermaran and @batmanscode, This looks interesting, would love to collaborate. I have experience with both TypeScript and Python package development.

Have reviewed zerox source, can assist in replicating it to Python. My goal would be to ensure that the API and build process remain consistent across both the TypeScript and Python implementations.

Looking forward to working together!

wizenheimer commented 1 month ago

Hey @tylermaran, Quick update. Prepared a PR #4 which presents the monorepo structure for Zerox. This includes Poetry for dependency management, a Makefile for build automation, and some code quality checks. Current implementations are placeholders. The actual implementation details will be added once the proposed structure gets reviewed and approved :D

saitej123 commented 1 month ago

Can gpt4 mini provide bounding box details also ? If I want to highlight key information in document

tylermaran commented 1 month ago

@saitej123 I've been looking into this as well. It doesn't seem to be immediately available using gpt-4o-mini.

I know it's possible to use a library like YOLOv8 to grab bounding boxes. But that get's a little harder when you have to host an additional model.

I think the general flow would be:

  1. Parse the document with gpt mini
  2. Split the resulting markdown into semantic sections (i.e. headers, subheaders, tables, etc.)
  3. For each semantic section, use some tool to find bounding boxes in the original image

This is a bit separate from the python request, so I added a tracking issue #7

saitej123 commented 1 month ago

If we use azure ocr or gcp we can map bounding box not sure mapping may fail it split in different way

tylermaran commented 1 month ago

@wizenheimer merged your repo updates for the python package in #4

Great work. Now we just need to add the core logic.

wizenheimer commented 1 month ago

Hey @tylermaran, Added the PR #10 introducing Python SDK for Zerox. Ensured the external API and types remain consistent across the SDKs.

RazvanMihaiPopa commented 4 weeks ago

Could you add a usage section for python in the README?

guici123 commented 1 week ago

Could you add a usage section for python in the README?

guici123 commented 1 week ago

Could you add a usage section for python in the README? @tylermaran

pradhyumna85 commented 1 week ago

@guici123, @RazvanMihaiPopa have a look at this PR https://github.com/getomni-ai/zerox/pull/21, should be useful.