Unstructured-IO / unstructured-api

Apache License 2.0
528 stars 110 forks source link

build(deps): bump unstructured-inference from 0.5.28 to 0.6.5 in /requirements #257

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Bumps unstructured-inference from 0.5.28 to 0.6.5.

Release notes

Sourced from unstructured-inference's releases.

0.6.5

  • Add functionality to keep extracted image elements while merging inferred layout with extracted layout
  • Fix source property for elements generated by pdfminer.
  • Add 'OCR-tesseract' and 'OCR-paddle' as sources for elements generated by OCR.

0.6.3

What's Changed

Bug fixes

Full Changelog: https://github.com/Unstructured-IO/unstructured-inference/compare/0.6.1...0.6.3

0.6.1

What's Changed

Full Changelog: https://github.com/Unstructured-IO/unstructured-inference/compare/0.5.31...0.6.1

0.5.31

  • Add functionality to extract and save images from the page
  • Add functionality to get only "true" embedded images when extracting elements from PDF pages
  • Update the layout visualization script to be able to show only image elements if need
  • add an evaluation metric for table comparison based on token similarity
  • fix paddle unit tests where make test fails since paddle doesn't work on M1/M2 chip locally
Changelog

Sourced from unstructured-inference's changelog.

0.6.5

  • Add functionality to keep extracted image elements while merging inferred layout with extracted layout
  • Fix source property for elements generated by pdfminer.
  • Add 'OCR-tesseract' and 'OCR-paddle' as sources for elements generated by OCR.

0.6.4

  • add a function to automatically scale table crop images based on text height so the text height is optimum for tesseract OCR task
  • add the new image auto scaling parameters to config.py

0.6.3

  • fix a bug where padded table structure bounding boxes are not shifted back into the original image coordinates correctly

0.6.2

  • move the confidence threshold for table transformer to config

0.6.1

  • YoloX_quantized is now the default model. This models detects most diverse types and detect tables better than previous model.
  • Since detection models tend to nest elements inside others(specifically in Tables), an algorithm has been added for reducing this behavior. Now all the elements produced by detection models are disjoint and they don't produce overlapping regions, which helps reduce duplicated content.
  • Add source property to our elements, so you can know where the information was generated (OCR or detection model)

0.6.0

  • add a config class to handle parameter configurations for inference tasks; parameters in the config class can be set via environement variables
  • update behavior of pad_image_with_background_color so that input pad is applied to all sides

0.5.31

  • Add functionality to extract and save images from the page
  • Add functionality to get only "true" embedded images when extracting elements from PDF pages
  • Update the layout visualization script to be able to show only image elements if need
  • add an evaluation metric for table comparison based on token similarity
  • fix paddle unit tests where make test fails since paddle doesn't work on M1/M2 chip locally
Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
awalker4 commented 1 year ago

@dependabot close