-
A new regex pattern was needed for BOM statements.
An additional function was created to house the pattern.
The regex works but appears to get tripped up when the pattern is `1(page 1 of x)`.
…
emdeh updated
2 months ago
-
```
from unstructured.ingest.connector.confluence import ConfluenceAccessConfig, SimpleConfluenceConfig
from unstructured.ingest.interfaces import PartitionConfig, ProcessorConfig, ReadConfig
from …
-
### Thank you for submitting a possible bug!
Please ensure the following:
* Your issue is based on the latest commit ✅ yes
* State your OS and OS version ✅windows11 22H2
* When reporting…
-
Hello,
I would like to ask officially to the team that is today in charge of DFU code to accept to introduce into DFU some code that will permit mods in other languages than English to use a grammar …
-
Hi Team,
I am trying to extract text using Document AI from a pdf file stored in Google cloud storage bucket.
I am able to extract text when I process pdf on google console. However, when I am …
-
I have followed the tutorial exactly, and i keep getting this error. my key.json is located in home/jonathan_ruben_fernandes/key.json
Here is my error message. Nothing on stack overflow and openai …
-
Found while working #2264,
> The post processor will be run after on the output file being a pdf. Until recently, pdf_document() had no post processor and it seemed to work. But there is one today…
cderv updated
2 years ago
-
Explore switching PDF Splitter from PikePDF to PyMuPDF
See if efficiency/code readability improves
https://pymupdf.readthedocs.io/en/latest/about.html
-
### description
Replacing bookmarks lead to a message; it's not specified if it's an error or a warning (error-code: `1`).
### actual behavior
_pdfcpu_ throws this messages; `pdfcpu: corrupt book…
-
PDF classifier for freight brokerage company
* Depending on what type of document it was, trigger a workflow
* Trucking company receiving various docs in email
* Workflow
* Import PDF
* Convert…