Fix/overlapping of bboxes by @benjats07 in Unstructured-IO/unstructured-inference#201 This change makes yolox the default model for element detection and removes duplicated or near duplicated bounding boxes in the results to reduce noise in the final elements.
Add functionality to keep extracted image elements while merging inferred layout with extracted layout
Fix source property for elements generated by pdfminer.
Add 'OCR-tesseract' and 'OCR-paddle' as sources for elements generated by OCR.
0.6.4
add a function to automatically scale table crop images based on text height so the text height is optimum for tesseract OCR task
add the new image auto scaling parameters to config.py
0.6.3
fix a bug where padded table structure bounding boxes are not shifted back into the original image coordinates correctly
0.6.2
move the confidence threshold for table transformer to config
0.6.1
YoloX_quantized is now the default model. This models detects most diverse types and detect tables better than previous model.
Since detection models tend to nest elements inside others(specifically in Tables), an algorithm has been added for reducing this
behavior. Now all the elements produced by detection models are disjoint and they don't produce overlapping regions, which helps
reduce duplicated content.
Add source property to our elements, so you can know where the information was generated (OCR or detection model)
0.6.0
add a config class to handle parameter configurations for inference tasks; parameters in the config class can be set via environement variables
update behavior of pad_image_with_background_color so that input pad is applied to all sides
0.5.31
Add functionality to extract and save images from the page
Add functionality to get only "true" embedded images when extracting elements from PDF pages
Update the layout visualization script to be able to show only image elements if need
add an evaluation metric for table comparison based on token similarity
fix paddle unit tests where make test fails since paddle doesn't work on M1/M2 chip locally
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps unstructured-inference from 0.5.28 to 0.6.5.
Release notes
Sourced from unstructured-inference's releases.
Changelog
Sourced from unstructured-inference's changelog.
Commits
12ca9d9
chore: changelog fix, cut release 0.6.5 (#230)00b4936
Feat/219 keep extracted image elements (#225)f4236c8
Fix/pdf miner source property (#228)c4d3e8b
feat: add autoscaling for table images (#210)cb2aff2
fix: padded boxes are not rescaled/shifted correctly (#229)35ebea7
feat: add pre commit hook (#220)8c6d669
feat: make table transformer parameters configurable (#224)eaa8d65
Fix/nested bounding boxes (#201)5e73202
feat: add config class (#218)bfdf357
chore: changelog repair (#221)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show