Closed egork520 closed 1 year ago
Hello @kyleclo I identified an issue in referencing all_word_ids[-1] in case of no words detected on the page. I could try to fix it by checking first if the list is empty. But if you know a better fix please let me know
Here is the page screen shot:
And the paper:
f87f9a26543e03c985867d0dbff1b900ecb6e46d.pdf
Here is the stack trace:
`File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/parsers/pdfplumber_parser.py:170, in PDFPlumberParser.parse(self, input_pdf_path) 166 all_tokens.extend(fine_tokens) 167 all_row_ids.extend( 168 [i + last_row_id + 1 for i in line_ids_of_fine_tokens] 169 ) --> 170 last_row_id = all_row_ids[-1] 171 all_word_ids.extend( 172 [i + last_word_id + 1 for i in word_ids_of_fine_tokens] 173 ) 174 last_word_id = all_word_ids[-1]
IndexError: list index out of range `
Link to the fix: PR
Hello @kyleclo I identified an issue in referencing all_word_ids[-1] in case of no words detected on the page. I could try to fix it by checking first if the list is empty. But if you know a better fix please let me know
Here is the page screen shot:
And the paper:
f87f9a26543e03c985867d0dbff1b900ecb6e46d.pdf
Here is the stack trace:
`File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/parsers/pdfplumber_parser.py:170, in PDFPlumberParser.parse(self, input_pdf_path) 166 all_tokens.extend(fine_tokens) 167 all_row_ids.extend( 168 [i + last_row_id + 1 for i in line_ids_of_fine_tokens] 169 ) --> 170 last_row_id = all_row_ids[-1] 171 all_word_ids.extend( 172 [i + last_word_id + 1 for i in word_ids_of_fine_tokens] 173 ) 174 last_word_id = all_word_ids[-1]
IndexError: list index out of range `