DevashishPrasad / CascadeTabNet

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
MIT License
1.49k stars 427 forks source link

How to train your model from scratch? #42

Closed StanleyGan closed 4 years ago

StanleyGan commented 4 years ago

Hi! Very interesting paper and I am interested in training the model from scratch. Do you have a script available for reference? I am not an expert in object detection and a reference script for full pipeline training would be greatly appreciated.

Thanks.

DevashishPrasad commented 4 years ago

Hello Stanley,

To train the model from scratch just don't load the pretrained checkpoint in the config file.

You can refer https://www.dlology.com/blog/how-to-train-an-object-detection-model-with-mmdetection/ for training custom mmdetection models

StanleyGan commented 4 years ago

So you just have to change the config file and everything works? I don't see the code where the image transformation and augmentation are being applied.

DevashishPrasad commented 4 years ago

yes just change the config and give relevant paths,

Image transformation and augmentation was done before training the model. So first all of the data was augmented and saved. Then training was performed

We did static augmentation and you can perform on the fly augmentation (dynamic augmentation)

StanleyGan commented 4 years ago

@DevashishPrasad Thanks for your reply! May I know more about the result variable presented in your demo? It returns a list of 2 objects, which I am unsure what are those. And each of the object are of length 80, which is the number of classes defined in the config file. What are the coordinates of the bounding boxes and the table type? How can I extract these information from result?

Thanks!

DevashishPrasad commented 4 years ago

Here is an example how you can use the result variable. Just skip the empty lists present in it and draw bounding boxes accordingly.

from mmdet.apis import init_detector, inference_detector
import mmcv
import mmcv.visualization.image as mmcv_image
import cv2
from google.colab.patches import cv2_imshow

config_file = 'Path_to/19cascade_mask_rcnn_hrnetv2p_w32_20e.py'
checkpoint_file = 'Path_to/epoch.pth'

# build the model from a config file and a checkpoint file
model = init_detector(config_file, checkpoint_file, device='cuda:0')

# Make inference from model and get the result
img = '/content/3.jpg'
result = inference_detector(model, img)

# Read the image
im = cv2.imread(img)
result = result[0]
for c in result:
  for i in c:
    if(len(i)==0):
      continue
    x,y,w,h,c = i
    if(c<0.5):
      continue
    cv2.rectangle(im,(int(x),int(y)),(int(w),int(h)),(255,0,0),2)

cv2_imshow(cv2.resize(im,(800,800)))
cv2.imwrite("/content/rest3.jpg",im)
StanleyGan commented 4 years ago

@DevashishPrasad Thanks a lot for your guide! :)

StanleyGan commented 4 years ago

@DevashishPrasad I saw that there are two non-empty list in result for my image. Does one represent the bounding boxes of the cells (table structure recognition) and the other one is bounding box for the whole table (table detection)?

kshitijkapadni commented 4 years ago

Yeah, the result is represented as follows:

1st list : Bordered table Bounding Boxes 2nd list : Cells Bounding Boxes 3rd list : Borderless Bounding Boxes

VIBIN1234 commented 3 years ago

@DevashishPrasad I am able to extract table co-ordinates for an image as you mentioned above. But I want to extract coordinates separately for each table. Please help me to get the coordinates of the bounding boxes for each table separately.

kshitijkapadni commented 3 years ago

Hello @VIBIN1234, The result variable contains 3 lists as follows: 1st list : Bordered table Bounding Boxes 2nd list : Cells Bounding Boxes 3rd list : Borderless Bounding Boxes All these list consists of lists of its corresponding tables or cell seperately itself. So you can extract it from result variable.

VIBIN1234 commented 3 years ago

Hello @VIBIN1234, The result variable contains 3 lists as follows: 1st list : Bordered table Bounding Boxes 2nd list : Cells Bounding Boxes 3rd list : Borderless Bounding Boxes All these list consists of lists of its corresponding tables or cell seperately itself. So you can extract it from result variable.

Thank you so much, I got it.