-
**Describe the bug**
User gets a `TesseractError` when processing a particular document.
**To Reproduce**
Code was an API call with a certain image-based document.
**Expected behavior**
Docum…
-
The provided datasets have four variants, each serving a specific purpose, and contain a `text_description` as described below E.g gov:
1. **syntheticDocQA_government_reports_test** – **No text_des…
-
Hello,
Is there a way to install the library unstructured[pdf] in lightweight format just to use "fast" strategy without all other dependencies?
Thank you in advance for your support.
-
We use a logger in a lot of places in this repo, which is good! To make this structured logging as useful as it can be, the `event` argument (usually the first unnamed arg) should be a static string …
-
```
[](https://localhost:8080/#) in extract_data_from_pdf(pdf_path)
57 # Function to extract text using the unstructured library
58 def extract_data_from_pdf(pdf_path):
---> 59 eleme…
-
I would like to add custom metadata to chunks when saved to pinecone with Pipeline.from_configs.
Following the 'Custom meta data extraction ...' notebook on [this page](https://docs.unstructured.io…
-
I have logs that are mostly json, but some logs come from system calls that can't be structured. It would be nice if I could set a rule that allowed me to capture the whole log and put it for example …
-
My custom image works as expected when ran locally against a `test.docx` from an s3 path.
But when I upload the image to lambda, I get the error `BadZipFile: Bad magic number for central directory`…
-
Been loving ElectroDB, thanks for working on it!
Is there a way to model unstructured/partially structured maps in an entity schema? The use case is for complex and potentially large json objects (…
-
While reading html files we encountered the problem that we end up with an empty list.
Here is a small example:
```python
from unstructured.partition.html import partition_html
html_content="…