issues
search
Unstructured-IO
/
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.8k
stars
626
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
bug/combineUnderNChars not working properly
#3138
leSullivan
closed
1 month ago
6
feat/Allow max-pages/max-total-characters that should be parsed
#3137
abdullahbaa5
opened
1 month ago
2
build(deps): bump matplotlib from 3.7.2 to 3.9.0 in /requirements
#3136
dependabot[bot]
closed
1 month ago
1
build(deps): bump dropboxdrivefs from 1.4.0 to 1.4.1 in /requirements
#3135
dependabot[bot]
closed
1 month ago
1
build(deps): bump python-gitlab from 4.5.0 to 4.6.0 in /requirements
#3134
dependabot[bot]
closed
1 month ago
1
build(deps): bump slackapi/slack-github-action from 1.24.0 to 1.26.0
#3133
dependabot[bot]
closed
1 month ago
1
build(deps): bump actions/cache from 3 to 4
#3132
dependabot[bot]
closed
1 month ago
1
build(deps): bump peaceiris/actions-gh-pages from 3 to 4
#3131
dependabot[bot]
closed
1 month ago
1
fix: parsing pdf error - new_cells as str has no "copy"
#3130
christinestraub
closed
1 month ago
2
ci: remove jira issue workflow
#3129
MthwRobinson
closed
1 month ago
0
fix: remote root handlers when they exist
#3128
MthwRobinson
closed
1 month ago
0
ci: update `pinecone` test to use serverless
#3127
MthwRobinson
closed
1 month ago
0
feat: configure googlevisionapi
#3126
MthwRobinson
closed
1 month ago
0
Not respecting NLTK_DATA environment variable
#3125
TaylorN15
closed
1 month ago
4
enhancement: apply tar filters when using python 3.12 or above
#3124
MthwRobinson
closed
1 month ago
0
bug/docker images at quay.io not up to date
#3123
jpabbuehl
closed
2 weeks ago
10
chore: add auth to s3 destination test
#3122
ryannikolaidis
closed
1 month ago
0
bug/<pdfminer>
#3121
Tilemachoc
closed
1 month ago
1
docs: cleanup readme; add python 3.12
#3120
MthwRobinson
closed
1 month ago
1
bug/parsing pdf error - new_cells as str has no "copy"
#3119
mpierangeli-q99
closed
1 month ago
6
docs: explicitly replace all old pages with link to new docs
#3118
MthwRobinson
closed
1 month ago
0
feat: configure googlevision api endpoint
#3117
dlozeve
closed
1 month ago
2
bug/partition_html ouputs different results with different args
#3116
KMayank29
opened
1 month ago
5
docs: make 404 pages same as index
#3114
MthwRobinson
closed
1 month ago
0
fix: remove 404 from docs
#3112
MthwRobinson
closed
1 month ago
0
[DRAFT] fix: revert dropping of filename extension for some connectors <- Ingest test fixtures update
#3111
ryannikolaidis
closed
1 month ago
0
build: pin python-docx
#3110
qued
closed
1 month ago
0
fix: revert dropping of filename extension for some connectors
#3109
ryannikolaidis
closed
1 month ago
0
enhancement: make tempfiles windows friendly
#3108
Blaxzter
closed
1 month ago
10
rfctr: clean MSG partitioner and tests as prep
#3107
scanny
closed
1 month ago
0
fix: `partition_pdf()` removes spaces from the text
#3106
christinestraub
closed
1 month ago
0
`partition_doc` fails the first time it is run in the AMD64 container
#3105
MthwRobinson
closed
1 month ago
4
fix: uninstall bson for mongo connector
#3104
MthwRobinson
closed
1 month ago
0
DOCX doesn't recognize listitems within textbox
#3103
veredmm
opened
2 months ago
7
bug/PIL.UnidentifiedImageError: cannot identify image file
#3102
udit-pandey-1
opened
2 months ago
13
unstructured-ingest s3 command causes Fsspec.Downloader.download_config.download_dir to be None
#3101
tuvalusoftware
opened
2 months ago
1
bug/bounding boxes using strategy="hi_res" are wrong
#3100
mandar-karhade
opened
2 months ago
4
feat: add VoyageAI embeddings (#3069)
#3099
MthwRobinson
closed
2 months ago
2
CORE-5030 gpt-4o ocr adam
#3098
amaciaszek-dsai
opened
2 months ago
0
Unable to load file
#3097
vlavorini
opened
2 months ago
6
### feat(unstructured/partition/docx.py): Fix Compatibility Issue with Chinese Text in Document Parsing
#3096
JIAQIA
closed
1 month ago
6
chore: reduce excessive logging
#3095
badGarnet
closed
2 months ago
0
fix: disable table_as_cells output by default <- Ingest test fixtures update
#3094
ryannikolaidis
closed
2 months ago
0
fix: disable table_as_cells output by default
#3093
badGarnet
closed
2 months ago
0
fix: add missing params to ElementMetadata
#3092
scanny
closed
2 months ago
0
fix: decide table extraction <- Ingest test fixtures update
#3091
ryannikolaidis
closed
2 months ago
0
fix: decide table extraction
#3090
christinestraub
closed
2 months ago
0
[Merge request] bug fix on table structure metric
#3089
tbs17
closed
1 month ago
2
fix: set `resolve_entities=False` in `partition_xml`
#3088
MthwRobinson
closed
2 months ago
0
ImportError: cannot import name 'CompositeElement' from 'unstructured.documents.elements'bug/<short-name>
#3087
mahmoudaymo
closed
2 months ago
1
Previous
Next