Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.67k stars 449 forks source link

index error #112

Closed under-score closed 2 years ago

under-score commented 2 years ago
Traceback (most recent call last):

  File "/Users/user/Documents/Daten/Projekte/Scripts/opencv_v2/my.py", line 85, in <module>
    pdflayout, images = lp.load_pdf(os.path.join(dir, fn), load_images=True, dpi=300)

  File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/layoutparser/io/pdf.py", line 182, in load_pdf
    page_tokens = extract_words_for_page(

  File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/layoutparser/io/pdf.py", line 57, in extract_words_for_page
    df[["x0", "x1"]].clip(lower=0, upper=int(page.width)).astype("float")

  File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 3464, in __getitem__
    indexer = self.loc._get_listlike_indexer(key, axis=1)[1]

  File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 1314, in _get_listlike_indexer
    self._validate_read_indexer(keyarr, indexer, axis)

  File "/Users/user/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 1374, in _validate_read_indexer
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")

KeyError: "None of [Index(['x0', 'x1'], dtype='object')] are in the [columns]"

Is there also a space in the column index?

lolipopshock commented 2 years ago

Interesting -- would you mind helping me check if there's an empty page in the input PDF document? Thanks!

under-score commented 2 years ago

sorry, it's gone, but can't remember any PDF with a blank page

lolipopshock commented 2 years ago

I suspect it's caused by empty PDF pages, which I should fix in the next few updates. I'll close this issue for now but feel free to reopen it and when you can get that PDF and take a look at it. Thanks!