atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 355 forks source link

Extract text from specific area #437

Closed abdullasalimov closed 4 years ago

abdullasalimov commented 4 years ago

Good Day, I have standard 'pdf' file with drawing in it. I want to extract text from specific area. See attached pdf file. (Table 1, table2)

Here is my code:

import camelot
import os

dest = 'pdf file path'

table1 = camelot.read_pdf(dest, table_regions=['28,82,648,864'])
table1[0].df

But it throws an error:

>>> table1 = camelot.read_pdf(d, table_regions=['28,82,648,864'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\abdulla.salimov\AppData\Local\Programs\Python\Python38-32\lib\site-packages\camelot\io.py", line 113, in read_pdf
    tables = p.parse(
  File "C:\Users\abdulla.salimov\AppData\Local\Programs\Python\Python38-32\lib\site-packages\camelot\handlers.py", line 171, in parse
    t = parser.extract_tables(
  File "C:\Users\abdulla.salimov\AppData\Local\Programs\Python\Python38-32\lib\site-packages\camelot\parsers\lattice.py", line 403, in extract_tables
    self._generate_table_bbox()
  File "C:\Users\abdulla.salimov\AppData\Local\Programs\Python\Python38-32\lib\site-packages\camelot\parsers\lattice.py", line 236, in _generate_table_bbox
    self.image, self.threshold = adaptive_threshold(
  File "C:\Users\abdulla.salimov\AppData\Local\Programs\Python\Python38-32\lib\site-packages\camelot\image_processing.py", line 43, in adaptive_threshold
    threshold = cv2.adaptiveThreshold(
cv2.error: OpenCV(4.2.0) C:\projects\opencv-python\opencv\modules\core\src\alloc.cpp:73: error: (-4:Insufficient memory) Failed to allocate 139225504 bytes in function 'cv::OutOfMemoryError'

What am I doing wrong? attachement.pdf