StatCan / SLICEmyPDF

This project uses SLICE algorithm to extract information from a text-based PDF page containing financial statements (tabular data). It can also be used to extract regular tables but will contain all text on a page.
Other
60 stars 14 forks source link

Exception: Unable to post-process table! Try extract(FS_flag=False) #2

Closed PL450 closed 3 years ago

PL450 commented 3 years ago

Hi

I encountered the above error for tables with headers consisting of date like "1H21" or item names.

But when I check the gridview of the image, all the columns are correctly partitioned. So why is there an issue transforming to panada data table? Must the headers be date for the code to transform to panada data table?

Thank you!

PL450 commented 3 years ago

Oh, I read your instructions in instructions.ipynb. I see Flag=True is only for tables with the following Assumptions: Page has only 1 table Start of table marked by complete date or year (2000 or onwards)