-
> @pudo proposed this idea in https://github.com/deanmalmgren/textract/pull/66#issuecomment-54709071 and I wanted to be sure to capture it before I forget.
With the way that the pdf parser currently…
-
`moto` made a backwards incompatible change. I dealt with this one recently here as well:
- https://github.com/simonw/textract-cli/issues/1#issuecomment-2027578406
-
## Version **2.1.1** of [textract](https://github.com/dbashford/textract) just got published.
Branch
Build failing 🚨
Dependency
te…
-
TLDR:
```
pip install --upgrade argcomplete beautifulsoup4 chardet docx2txt EbookLib extract-msg IMAPClient lxml olefile pdfminer.six Pillow pip pycryptodome PyPDF2 python-pptx pytz setuptools six s…
-
I tried an unsupported format to it using the following
`textract.process('./test.pyc')`
and I got the following error:
```python
Exception raised:
Traceback (most recent call last):
…
-
**space different show between linux and mac **
the textract in "line break" or "space" is obviously different between linux and mac.
On linux, "line break" is parsed as multiple \n\n, and "space" …
-
I am parsing an existing JSON response from the asynchronous call - **textract.start_document_analysis()** but it fails to parse it. I have a multipage pdf. I get an AssertionError -
```
from text…
-
Hi all, I have just recently started to work with Textract and I think a simple feature could be added. This took me a while to figure out...
I was interested in getting the confidence scores direct…
-
Using Python 3.7.6, Pip 20.0.2, Conda 4.8.2, Spyder 4.0.1, and Textract 1.6.3.
When using textract.process('url', method='METHOD'), 'pdftotext' executes without problem (but the pdf is not text so …
-
Hi all,
Based on my understanding, Textract provides an axis-aligned BoundingBox object and a Polygon object which is composed of more specific points (https://docs.aws.amazon.com/textract/latest/d…