Open getwithashish opened 2 months ago
@tylermaran Please have a look at this PR.
Hi @tylermaran! I wanted to check in on my PR #44. If you have any feedback, I’d love to hear it—just making sure it hasn’t gotten lost in the shuffle!
I’m also planning to work on a Node version for this PR, so any input would be super helpful.
Hey @getwithashish! Sorry I sat on this one for so long. But starting to really look into bounding boxes now and will be testing out your PR. Although one thing we're thinking about is pretty much running your method in reverse.
i.e.
I think this method gives a couple improvements:
Summary
This pull request introduces a new feature to locate the bounding box of each section within an image, enhancing the traceability of the markdown content. Users now have the ability to toggle this feature to obtain bounding box information for any markdown-generated section.
Why
Previously, there was no way to trace which section of the image the generated markdown originated from, limiting the interpretability of the output. This feature addresses that gap by providing bounding box coordinates for each markdown section.
Changes
bounding_box
param is set toTrue
(pdf.py)bounding_box
param is set toTrue
(modellitellm.py)Section
type which will include all the identified sections of a page, along with their corresponding bounding boxes (types.py)Page
model to include sections and their bounding boxes (zerox.py)pyproject.toml
(pyproject.toml)Functionality
Bounding Box De-Normalization
Bounding boxes are normalized (values between 0 and 1). To de-normalize, multiply the normalized values by the image's dimensions (width, height):
Usage
Output
Generated Markdown
Each declaration specifies the variable's type followed by the identifier and ending with a semicolon. The identifier rules are fairly standard: a name can consist of lowercase and uppercase alphabetic characters, numbers, and underscores but may not begin with a numeric character. We adopt the modern camelCasing naming convention for variables in our code. In general, variables must be assigned a value before you can use them in an expression. You do not have to immediately assign a value when you declare them (though it is good practice), but some value must be assigned before they can be used or the compiler will issue an error.
The assignment operator is a single equal sign,
=
and is a right-to-left assignment. That is, the variable that we wish to assign the value to appears on the left-hand-side while the value (literal, variable or expression) is on the right-hand-side. Using our variables from before, we can assign them values:2Instance variables, that is variables declared as part of an object do have default values. For objects, the default is
null
, for all numeric types, zero is the default value. For theboolean
type,false
is the default, and the defaultchar
value is\0
, the null-terminating character (zero in the ASCII table).