Closed MurphyLo closed 3 months ago
Hey @MurphyLo. Thanks for the detailed report.
I've seen this problem before, so I think I know where the bug is coming from. Let me dig in and I'll let you know when I can push up a fix.
I had a similar issue when i have more than 10 pages. I implemented this fix : https://github.com/getomni-ai/zerox/pull/9 If it's not helpful don't hesitate to delete it !
When using the
zerox()
function, the returned data structure contains apages
array. However, there appears to be an inconsistency between the order of pages in this array and their actual page numbers in the original document.Current Behavior
pages
array in thezerox()
output does not maintain the correct order of pages from the original document.content
field within each page object does not correspond to thepage
number specified in the same object.Example
Take the paper DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video as an example, the function returns: