Tensorflow Implementation of the Tablenet research Paper.
Blog Link : https://ashishsalaskar1.medium.com/tablenet-end-to-end-table-detection-and-tabular-data-extraction-from-scanned-document-images-13c846e8f8f5
You can find a demo of the model here : https://youtu.be/wvREr71zPe4
The task here is of extracting tables from the image as a two-step process. First, we train a model which will try to detect the table regions i.e area of the image which contains a table. Once table regions are found, another model would then Recognize the structure of the table. We can split the process into two steps 1) Table Detection (Detect Table region) 2) Table Structure Segmentation ( Detect the rows and columns in the detected table)
In our model we try to integrate both these steps into a single end-to-end trainable Deep Learning model.
We will be using the following datasets:
To extract the bounding boxes from the table mask in the Marmots dataset. First, we need to convert the hex numbers into integers first. Marmots dataset first defines a CropBox which represents the content area. The bounding boxes have attributes that are relative to this crop Box
Before we start working on the model, we will first create a Tensorflow Pipeline using a Dataloader for our dataset. This would be more efficient when compared to loading each image individually and processing them while training. Here are the preprocessing we do
The model architecture consists of 3 main parts Backbone / Encoder: Here we will use a pre-trained VGG-19 model as the encoder. The fully connected layers in the VGG-19 model (layers after Pool 5) are replaced with two (1x1) Conv layers. These act as input which is distributed to the table and column branches.
Once we get the predicted masks from the model, we try to use a few OpenCV functions to smoothen out the boxes. First, Gaussian blur is applied to the masks and then we try to find the contours in the mask. Once contours are found for each contour we fit a rectangular bounding rectangle which gives significantly better masks. Also, we ignore regions that are very small which indicate irregular points/patches. Using these post-processing steps the masks obtained are significantly better.
Once the table masks are predicted we use the PyTesseract library in order to extract the text from the Tables and save them into CSV files. We follow these steps First, take the input image and get the predicted masks from the model. Apply post-processing on these masks. Using these masks, get the regions of interest from the original image. We mask non-table regions with black color. Use OpenCV functions, in order to extract these individual tables. For each table apply some processing like sharpening filters, reshape and thresholding. Once each table is processed, save that table image separately. We then pass the Table region to extract the text from the table and save it into a CSV file. In the end, we would get images of individual tables as well as the CSV files for each table.