issues
search
Unstructured-IO
/
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.54k
stars
595
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
ci: update `pinecone` test to use serverless
#3127
MthwRobinson
closed
1 month ago
0
feat: configure googlevisionapi
#3126
MthwRobinson
closed
1 month ago
0
Not respecting NLTK_DATA environment variable
#3125
TaylorN15
closed
1 month ago
4
enhancement: apply tar filters when using python 3.12 or above
#3124
MthwRobinson
closed
1 month ago
0
bug/docker images at quay.io not up to date
#3123
jpabbuehl
closed
1 day ago
10
chore: add auth to s3 destination test
#3122
ryannikolaidis
closed
1 month ago
0
bug/<pdfminer>
#3121
Tilemachoc
closed
1 month ago
1
docs: cleanup readme; add python 3.12
#3120
MthwRobinson
closed
1 month ago
1
bug/parsing pdf error - new_cells as str has no "copy"
#3119
mpierangeli-q99
closed
1 month ago
6
docs: explicitly replace all old pages with link to new docs
#3118
MthwRobinson
closed
1 month ago
0
feat: configure googlevision api endpoint
#3117
dlozeve
closed
1 month ago
2
bug/partition_html ouputs different results with different args
#3116
KMayank29
opened
1 month ago
5
docs: make 404 pages same as index
#3114
MthwRobinson
closed
1 month ago
0
fix: remove 404 from docs
#3112
MthwRobinson
closed
1 month ago
0
[DRAFT] fix: revert dropping of filename extension for some connectors <- Ingest test fixtures update
#3111
ryannikolaidis
closed
1 month ago
0
build: pin python-docx
#3110
qued
closed
1 month ago
0
fix: revert dropping of filename extension for some connectors
#3109
ryannikolaidis
closed
1 month ago
0
enhancement: make tempfiles windows friendly
#3108
Blaxzter
closed
3 weeks ago
10
rfctr: clean MSG partitioner and tests as prep
#3107
scanny
closed
1 month ago
0
fix: `partition_pdf()` removes spaces from the text
#3106
christinestraub
closed
1 month ago
0
`partition_doc` fails the first time it is run in the AMD64 container
#3105
MthwRobinson
closed
3 weeks ago
4
fix: uninstall bson for mongo connector
#3104
MthwRobinson
closed
1 month ago
0
DOCX doesn't recognize listitems within textbox
#3103
veredmm
opened
1 month ago
7
bug/PIL.UnidentifiedImageError: cannot identify image file
#3102
udit-pandey-1
opened
1 month ago
13
unstructured-ingest s3 command causes Fsspec.Downloader.download_config.download_dir to be None
#3101
tuvalusoftware
opened
1 month ago
1
bug/bounding boxes using strategy="hi_res" are wrong
#3100
mandar-karhade
opened
1 month ago
3
feat: add VoyageAI embeddings (#3069)
#3099
MthwRobinson
closed
1 month ago
2
CORE-5030 gpt-4o ocr adam
#3098
amaciaszek-dsai
opened
1 month ago
0
Unable to load file
#3097
vlavorini
opened
1 month ago
6
### feat(unstructured/partition/docx.py): Fix Compatibility Issue with Chinese Text in Document Parsing
#3096
JIAQIA
closed
1 month ago
6
chore: reduce excessive logging
#3095
badGarnet
closed
1 month ago
0
fix: disable table_as_cells output by default <- Ingest test fixtures update
#3094
ryannikolaidis
closed
1 month ago
0
fix: disable table_as_cells output by default
#3093
badGarnet
closed
1 month ago
0
fix: add missing params to ElementMetadata
#3092
scanny
closed
1 month ago
0
fix: decide table extraction <- Ingest test fixtures update
#3091
ryannikolaidis
closed
1 month ago
0
fix: decide table extraction
#3090
christinestraub
closed
1 month ago
0
[Merge request] bug fix on table structure metric
#3089
tbs17
closed
4 weeks ago
2
fix: set `resolve_entities=False` in `partition_xml`
#3088
MthwRobinson
closed
1 month ago
0
ImportError: cannot import name 'CompositeElement' from 'unstructured.documents.elements'bug/<short-name>
#3087
mahmoudaymo
closed
1 month ago
1
Fix: Chroma Upsert instead of Add
#3086
potter-potter
closed
1 month ago
1
fix: added the missing function argument
#3085
MillCheck
closed
1 month ago
0
bug/<Compatibility Issue with Chinese Text in Document Parsing>
#3084
JIAQIA
opened
1 month ago
4
build: bump amd64 image to python 3.12
#3083
MthwRobinson
closed
1 month ago
1
pptx initial error
#3082
OtokoNoIzumi
closed
1 month ago
1
feat(docx): add pluggable picture sub-partitioner
#3081
scanny
closed
1 month ago
0
CORE-5030 - gpt4o ocr adam
#3080
amaciaszek-dsai
closed
1 month ago
0
feat/custom-metadata
#3079
streamnsight
opened
1 month ago
6
Set `resolve_entities=False` by default in `lxml` parser for `partition_xml`
#3078
MthwRobinson
closed
1 month ago
0
fix: revert back to old requirements file for sphinx docs
#3077
MthwRobinson
closed
1 month ago
0
bug/windows reopen temp file (pdf hi_res)
#3076
KristianMischke
opened
1 month ago
1
Previous
Next