getomni-ai / zerox

PDF to Markdown with vision models
https://getomni.ai/ocr-demo
MIT License
5.85k stars 309 forks source link

Page order inconsistency in zerox() output #6

Closed MurphyLo closed 3 months ago

MurphyLo commented 3 months ago

When using the zerox() function, the returned data structure contains a pages array. However, there appears to be an inconsistency between the order of pages in this array and their actual page numbers in the original document.

Current Behavior

Example

Take the paper DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video as an example, the function returns:

{
    "completionTime": 73478,
    "fileName": "8538c6b512632605751470a80bbee6c3",
    "inputTokens": 638600,
    "outputTokens": 16045,
    "pages": [
        {
            "content": "# DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video\n\nNarek Tumanyan*, Assaf Singer*, Shai Bagon, Tali Dekel\n\nWeizmann Institute of Science  \n*Indicates equal contribution.  \nProject webpage: [dino-tracker.github.io](http://dino-tracker.github.io)\n\n![Tracking results](image_path)  \n(a) Tracking results  \n(b) Feature refinement  \n\n**Fig. 1:** ...",
            "page": 1,
            "contentLength": 2141
        },
        {
            "content": "# 10 N. Tumanyan, A. Singer et al.\n\n| Method     | DAVIS-256 | DAVIS-480 | Kinetics-256 | Kinetics-480 | BADJA |\n|------------|-----------|-----------|--------------|---------------|-------|\n|            | $\\delta_{OA}$ | $AJ$ | $\\delta_{OA}$ | $AJ$ | $\\delta_{OA}$ | $AJ$ | $\\delta_{OA}$ | $AJ$ | $\\delta_{OA}$ | $AJ$ |\n| RAFT [47]  | 56.7      | 66.7      | 50.4         | 60.3          | 42.0  | 5.8   |\n| DINOv2 [38]| 61.4      | 64.7      | 60.3         | 61.0          | 45.2  | 8.4   |\n| TAP-Net [12]| 63.4      | 81.4      | 64.7         | 69.0          | 61.7  | 11.7  |\n| PIPs++ [63]| 71.5      | 73.6      | 68.2         | 70.8          | 59.0  | 9.8   |\n| TAPIR [13] | 74.7      | 62.3      | 73.9         | 65.9          | 57.3  | 10.5  |\n| Co-Tracker [26]| 72.9  | 63.1      | 74.5         | 65.2          | 59.9  | 9.9   |\n| Omnimotion [51]| 67.5  | 53.1      | 74.1         | 58.4          | 62.9  | 5.0   |\n| Ours*      | 72.8      | 67.2      | 62.3         | 80.4          | 61.6  | 73.8  |\n\n* - supervised. ** ...",
            "page": 2,
            "contentLength": 2909
        },
        {
            "content": "# Taming DINO for Self-Supervised Point Tracking in a Single Video\n\n| Query points | Query points |\n|--------------|--------------|\n| Co-Tracker   | Co-Tracker   |\n| TAPIR        | TAPIR        |\n| Omnimotion   | Omnimotion   |\n| Our          | Our          |\n\n| Query points | Query points |\n|--------------|--------------|\n| Co-Tracker   | Co-Tracker   |\n| TAPIR        | TAPIR        |\n| Omnimotion   | Omnimotion   |\n| Our          | Our          |\n\n**Fig. 4:** ...",
            "page": 3,
            "contentLength": 750
        },
        {
            "content": "12 N. Tumanyan, A. Singer et al.\n\nQuery points\nCo-Tracker\nTAPIR\nOurs\n\nQuery points\nCo-Tracker\nTAPIR\nOurs\n\nFig. 5: Sample results on BADJA w.r.t. ground truth. Query points are color-coded on the frame at the top. Tracked points are marked on the target frames. ...",
            "page": 4,
            "contentLength": 1596
        },
tylermaran commented 3 months ago

Hey @MurphyLo. Thanks for the detailed report.

I've seen this problem before, so I think I know where the bug is coming from. Let me dig in and I'll let you know when I can push up a fix.

GautierT commented 3 months ago

I had a similar issue when i have more than 10 pages. I implemented this fix : https://github.com/getomni-ai/zerox/pull/9 If it's not helpful don't hesitate to delete it !