Inconsistent OCR Results Between Local pyzerox Usage and Online Demo

vizero1 commented 3 weeks ago

Hello, I'm experiencing a significant difference in OCR results between the pyzerox library used locally and the online OCR demo.

When I upload a PDF to the online OCR demo, the output is accurate and well-structured. However, when I use the same PDF with pyzerox locally (Python 3.12), I notice the following inconsistencies:

Some rows of text are missing. Certain pages are duplicated. Characters are occasionally omitted or garbled. While I expect some minor differences, these discrepancies are large enough to impact the usefulness of the OCR results. I would appreciate any information on the configuration or settings used in the online demo to help reproduce similar results locally.

Current Configuration: Here's the configuration I'm using:

async def process_file_with_zerox(config):
    result = await zerox(
        file_path=config["destination_file_name"],
        model="gpt-4o-mini",
        cleanup=True,
        output_dir="output/result2.md",
        maintain_format=True,
        # custom_system_prompt=custom_system_prompt_ocr  # Uncomment if necessary
    )

Environment: Python version: 3.12 pyzerox version: 0.0.7

Can not share the pdf file here so it will also be hard for you to reproduce that. So for the beginning would just be great to know the config for the online demo.

tylermaran commented 3 weeks ago

Hey @vizero1. The online demo is using gpt-4o which could explain a big change in the performance. The package defaults to 4o-mini, but as of right now the token cost for both models is largely the same for vision requests. So I would recommend 4o for most use cases.

We're also using the npm package for the demo, which has a bit of image correction work that may improve performance.

Let me know what your results are like if you test with 4o

vizero1 commented 3 weeks ago

ok, switched it now to 4o and seems to be better now. Thanks

getomni-ai / zerox

Inconsistent OCR Results Between Local pyzerox Usage and Online Demo #75