issues
search
Unstructured-IO
/
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
7.33k
stars
567
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Bugfix/ingest pipeline check
#3303
rbiseck3
opened
11 minutes ago
0
rfctr [P6M-397]: opensearch source connector v2
#3302
potter-potter
opened
3 hours ago
0
feat/more conservative ingest logging
#3301
rbiseck3
closed
36 minutes ago
0
Fix not counting false negatives and false positives in table metrics
#3300
plutasnyy
opened
9 hours ago
0
feat/extract_pdf_page_images
#3299
huanji1987
opened
18 hours ago
0
revert unstructured-client pin and make pip-compile
#3298
Coniferish
opened
20 hours ago
0
feat/migrate onedrive src <- Ingest test fixtures update
#3297
ryannikolaidis
closed
6 hours ago
0
build: move numpy pin to packaging
#3296
qued
closed
20 hours ago
0
feat/migrate onedrive src
#3295
rbiseck3
opened
22 hours ago
0
feat/migrate astra db
#3294
rbiseck3
closed
23 hours ago
1
rfct [P6M]-392: OpenSearch V2 Destination Connector
#3293
potter-potter
opened
1 day ago
0
Couchbase vector store support as destination and source connector
#3292
lokesh-couchbase
opened
1 day ago
1
File parsing CPU cores
#3291
shuaihutianxie
opened
1 day ago
0
bug/docker_tesseract_missing
#3290
neilkumar
opened
1 day ago
6
bug/poetry-in-dockerfile
#3289
MattiaCinelli
opened
1 day ago
0
Clean up warning table transformer warning statements statements
#3288
magallardo
opened
1 day ago
5
fix: wait to run soffice until there is no other soffice process running
#3287
badGarnet
closed
22 hours ago
0
feat: add v2 pinecone destination connector
#3286
ahmetmeleq
opened
1 day ago
0
feat/migrate gdrive source connector <- Ingest test fixtures update
#3285
ryannikolaidis
closed
1 day ago
0
bug/<Ingestion error to process attachments for .msg files>
#3284
mahmoudaymo
closed
22 hours ago
5
bug/<short-name>
#3283
rs-03
closed
1 day ago
2
fix: add arch into build images
#3282
MthwRobinson
closed
2 days ago
1
Return image data from confluence
#3281
ML-Abdula
opened
2 days ago
4
List block in a partitioned Markdown doc identified as a `Title` element under special conditions
#3280
nickphilip
opened
2 days ago
0
UnstructuredImageLoader,使用 OCRAgentPaddle 如何设置ocr 内存大小,模型/语言包下载地址
#3279
haike-1213
opened
3 days ago
0
fix(auto): partition() passes strategy to DOC,ODT
#3278
scanny
closed
17 hours ago
1
chore: Add test that tests all the different file types in example-docs
#3277
potter-potter
opened
4 days ago
0
Minor ocr_interface.py Error Handling Improvement
#3276
AscendingGrass
opened
4 days ago
1
chore: bump unstructured-inference 0.7.36
#3275
christinestraub
closed
2 days ago
1
Feat/pass down strategy to partition ppt as well
#3274
badGarnet
closed
4 days ago
1
fix(auto): partition() passes strategy to PPTX,DOCX
#3273
scanny
closed
4 days ago
1
build: fix amd64 image hash
#3272
MthwRobinson
closed
4 days ago
0
feat/code-snippets-context
#3271
asm0dey
opened
5 days ago
2
fix: update base image SHA for amd64 wolfi
#3270
christinestraub
closed
4 days ago
0
WIP: Ml 89/od metrics
#3269
mariannaparzych
opened
5 days ago
0
build: switch arm64 image to wolfi-base
#3268
MthwRobinson
closed
4 days ago
3
Compatibility Issue with Chinese Text in Document Parsing
#3267
JIAQIA
opened
5 days ago
1
feat/migrate gdrive source connector <- Ingest test fixtures update
#3266
ryannikolaidis
closed
5 days ago
0
Is there a way to convert text files to markdown format ?
#3265
shamanez
closed
4 days ago
5
Roman/bugfix conflicting event loop ingest
#3264
rbiseck3
closed
1 day ago
0
Fix missing sensitive fields for embedders
#3263
vangheem
closed
2 days ago
0
bug/<tables getting cut off at the edges when using hi res strategy>
#3262
rchen19
opened
5 days ago
1
feat/migrate gdrive source connector <- Ingest test fixtures update
#3261
ryannikolaidis
closed
5 days ago
0
Connection Error
#3260
ishansuhail
opened
6 days ago
3
build: version bump for release 0.14.7
#3259
MthwRobinson
closed
5 days ago
0
Feat: Add-rc-locator-to-partition-excel
#3258
marctorsoc
opened
6 days ago
0
rfctr(html): prepare for new html parser
#3257
scanny
closed
4 days ago
1
BUG - PPTX doesn't recognize text within slide notes
#3256
veredmm
closed
6 days ago
2
bug(html): invisible links are reported in metadata
#3255
scanny
opened
6 days ago
0
chore: Add markdown table support to Table element constructor
#3254
oguzhan1907
closed
1 week ago
0
Next