issues
search
Unstructured-IO
/
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.44k
stars
580
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
build(deps): deltalake bump to `0.18.x`
#3197
MthwRobinson
closed
2 weeks ago
0
feat: table evaluations for fixed html table generation
#3196
pawel-kmiecik
closed
2 weeks ago
0
Feat/migrate elasticsearch src connector <- Ingest test fixtures update
#3195
ryannikolaidis
closed
3 weeks ago
0
feat/less strict Python version
#3193
egeres
closed
2 days ago
4
bug/unstructured.paddleocr is not compatible with GPU version of PaddleOCR
#3191
peixin-lin
opened
3 weeks ago
1
fix: dropbox source connector file path bugs <- Ingest test fixtures update
#3190
ryannikolaidis
closed
3 weeks ago
0
fix: dropbox source connector file path bugs
#3189
ryannikolaidis
closed
2 weeks ago
0
fix: relative path / permissions issues with v2 fsspec connectors <- Ingest test fixtures update
#3188
ryannikolaidis
closed
3 weeks ago
0
fix: relative path / permissions issues with v2 fsspec connectors
#3186
ryannikolaidis
closed
3 weeks ago
0
Feat/migrate elasticsearch src connector <- Ingest test fixtures update
#3185
ryannikolaidis
closed
3 weeks ago
0
rfctr(html): drop page concept
#3184
scanny
closed
2 weeks ago
0
Suggestion: include consolidated bounding box coordinates in chunk metadata when using "by_title" chunking strategy
#3194
NikitaKemarskiy
opened
3 weeks ago
1
fix: run `libreoffice` once during wolfi image build
#3183
MthwRobinson
closed
3 weeks ago
1
bug/pdf extraction error when strategy not set
#3187
pk-lit
closed
2 days ago
2
feat: skip ocr for certain element types (Issue #3163)
#3182
beez2022
opened
3 weeks ago
2
Unstructured paid API stuck again
#3181
Neel-132
closed
3 weeks ago
1
rfctr(html): break coupling to DocumentLayout
#3180
scanny
closed
3 weeks ago
1
FIX: Use proper Astra env vars name to match internal docs
#3179
erichare
closed
1 week ago
3
Sensitive data security issues
#3178
arkim822
closed
3 weeks ago
2
rfctr(html): promote HTMLDoc candidate methods
#3177
scanny
closed
3 weeks ago
0
feat: Kafka source and destination connector
#3176
potter-potter
closed
1 week ago
0
feat/table element coordinates
#3175
naunidh-tetrix
opened
3 weeks ago
0
Feat/migrate elasticsearch src connector
#3174
rbiseck3
closed
2 weeks ago
0
Bump to `deltalake>=0.18.x`
#3173
MthwRobinson
closed
2 weeks ago
0
TypeError: UnstructuredClient.__init__() got an unexpected keyword argument 'retry_connection_errors'
#3172
Neel-132
closed
3 weeks ago
4
feat/databricks volumes src
#3171
rbiseck3
opened
3 weeks ago
0
build(deps): weekly dependency bumps (6/10/2024)
#3170
MthwRobinson
closed
3 weeks ago
0
Max retries exceeded. Unstructured API is stuck.
#3169
Neel-132
closed
3 weeks ago
5
Issue in partition_html and chunk_by_title
#3168
pss-123
opened
3 weeks ago
3
bug/Failure to recognize footer and page number ,incorrectly classifies as a Narrative text
#3167
tanzeel291994
closed
3 weeks ago
1
Add ability to pass pipeline param to Elasticsearch connector
#3166
aag6z
opened
3 weeks ago
1
rfctr(html): drop now dead XMLDocument and Document
#3165
scanny
closed
3 weeks ago
0
build: 0.14.5 release
#3164
MthwRobinson
closed
3 weeks ago
0
feat/skip ocr for certain element types
#3163
beez2022
opened
3 weeks ago
4
rfctr(html): improve SNR in HTMLDocument
#3162
scanny
closed
3 weeks ago
0
rfctr(html): organize and improve HTMLDocument tests
#3161
scanny
closed
3 weeks ago
0
feat: migrate weaviate connector to new framework
#3160
rbiseck3
closed
3 weeks ago
0
bug/language specification does not work for PaddleOCR agent
#3159
peixin-lin
opened
3 weeks ago
2
LangChain + Unstructured: Failed to load file ${filePath} using unstructured loader.
#3158
ajaykrupalk
closed
2 weeks ago
3
rfctr(msg): remove temporary new_msg.py
#3157
scanny
closed
3 weeks ago
0
rfctr(html): clean html tests in prep for PRs to follow
#3156
scanny
closed
3 weeks ago
0
Local API Error: `by_similarity` Chunking Strategy Not Recognized
#3155
eduardolundgren
closed
3 weeks ago
1
fix API-297: List parameters incorrectly passed to API requests
#3154
ds-filipknefel
closed
3 weeks ago
0
Salesforce/ source connector - Not able to ingest salesforce files
#3153
mogith-pn
opened
4 weeks ago
1
chore: use python3 consistently in makefile
#3152
badGarnet
closed
4 weeks ago
0
chore: Weaviate pyv4 example
#3151
dudanogueira
closed
3 weeks ago
0
Parsing HTML files
#3150
vinodhsiyer20
closed
2 weeks ago
5
feat/Excluding Specific Types
#3149
tevfikcagridural
opened
4 weeks ago
0
feat: Migrate over fsspec connectors <- Ingest test fixtures update
#3148
ryannikolaidis
closed
4 weeks ago
0
build(deps): weekly pip version bump
#3147
MthwRobinson
closed
4 weeks ago
0
Previous
Next