Fix/overlapping of bboxes by @benjats07 in Unstructured-IO/unstructured-inference#201 This change makes yolox the default model for element detection and removes duplicated or near duplicated bounding boxes in the results to reduce noise in the final elements.
fix a bug where padded table structure bounding boxes are not shifted back into the original image coordinates correctly
0.6.2
move the confidence threshold for table transformer to config
0.6.1
YoloX_quantized is now the default model. This models detects most diverse types and detect tables better than previous model.
Since detection models tend to nest elements inside others(specifically in Tables), an algorithm has been added for reducing this
behavior. Now all the elements produced by detection models are disjoint and they don't produce overlapping regions, which helps
reduce duplicated content.
Add source property to our elements, so you can know where the information was generated (OCR or detection model)
0.6.0
add a config class to handle parameter configurations for inference tasks; parameters in the config class can be set via environement variables
update behavior of pad_image_with_background_color so that input pad is applied to all sides
0.5.31
Add functionality to extract and save images from the page
Add functionality to get only "true" embedded images when extracting elements from PDF pages
Update the layout visualization script to be able to show only image elements if need
add an evaluation metric for table comparison based on token similarity
fix paddle unit tests where make test fails since paddle doesn't work on M1/M2 chip locally
Commits
cb2aff2 fix: padded boxes are not rescaled/shifted correctly (#229)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps unstructured-inference from 0.5.28 to 0.6.3.
Release notes
Sourced from unstructured-inference's releases.
Changelog
Sourced from unstructured-inference's changelog.
Commits
cb2aff2
fix: padded boxes are not rescaled/shifted correctly (#229)35ebea7
feat: add pre commit hook (#220)8c6d669
feat: make table transformer parameters configurable (#224)eaa8d65
Fix/nested bounding boxes (#201)5e73202
feat: add config class (#218)bfdf357
chore: changelog repair (#221)b9f032c
Feat/save embedded images in pdf (#208)5c2acc4
feat: add evaluation metric for table extraction (#216)bfb90e3
chore: skip paddle unittests local for mac (#214)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show