-
`text = textract.process(file, method='pdfminer')`
Error:
UnboundLocalError Traceback (most recent call last)
in ()
----> 1 text = textract.process(file, method='pdfmine…
-
> @pudo proposed this idea in https://github.com/deanmalmgren/textract/pull/66#issuecomment-54709071 and I wanted to be sure to capture it before I forget.
With the way that the pdf parser currently…
-
I am parsing an existing JSON response from the asynchronous call - **textract.start_document_analysis()** but it fails to parse it. I have a multipage pdf. I get an AssertionError -
```
from text…
-
## Version **2.1.1** of [textract](https://github.com/dbashford/textract) just got published.
Branch
Build failing 🚨
Dependency
te…
-
Version: 0.13
Using merged cell example:
`headers = table.get_header_field_names()`
'Table' object has no attribute 'get_header_field_names'
-
If you extract both LAYOUT and TABLEs, the tables for some reason are printed at the end of the output, rather than linearized correctly.
Related issue: https://github.com/aws-samples/amazon-textrac…
-
I tried an unsupported format to it using the following
`textract.process('./test.pyc')`
and I got the following error:
```python
Exception raised:
Traceback (most recent call last):
…
-
TLDR:
```
pip install --upgrade argcomplete beautifulsoup4 chardet docx2txt EbookLib extract-msg IMAPClient lxml olefile pdfminer.six Pillow pip pycryptodome PyPDF2 python-pptx pytz setuptools six s…
-
**space different show between linux and mac **
the textract in "line break" or "space" is obviously different between linux and mac.
On linux, "line break" is parsed as multiple \n\n, and "space" …
-
Using Python 3.7.6, Pip 20.0.2, Conda 4.8.2, Spyder 4.0.1, and Textract 1.6.3.
When using textract.process('url', method='METHOD'), 'pdftotext' executes without problem (but the pdf is not text so …