DevashishPrasad / CascadeTabNet

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
MIT License
1.49k stars 427 forks source link

Unclear of transfer learning in paper #67

Closed kkissmart closed 4 years ago

kkissmart commented 4 years ago

In your paper, did you initialized your model by 80-class-object trained COCO model? then finetune the model on orig_data + TableBank for table detection only (1-class-object)? then followed by a table structure detection?

Please clarify. Again, congrats on this paper. it is very impactful.

Thanks!

DevashishPrasad commented 4 years ago

Yes first the model is initialized with 80-class-object trained COCO model. Then we made a generic dataset by combining Marmot, ICDAR 19 and Github datasets. The model was fine-tuned for just the table detection (one class). We called this model the general model.

The general model was again fine-tuned separately for specific dataset for eg : Tablebank, ICDAR 19 etc. Also the general model was fine-tuned on ICDAR 19 B2 dataset for table structure recognition (3 classes).

We define this methodology as Iterative Transfer Learning and details about its effectiveness are elaborated in the paper.

kkissmart commented 4 years ago

Thanks for the answer! So you never trained on tableBank data? Thanks

DevashishPrasad commented 4 years ago

We did train it on tablebank (on a subset of table bank).

kkissmart commented 4 years ago

Sorry I am still confused there.

Tablebank has almost 300K data, why did you use it for finetune (subset) instead of including the entire dataset into your generic dataset?

Thanks!

DevashishPrasad commented 4 years ago

Yes we fine-tuned the general model on a subset of tablebank,

We did so mostly because of domain adaptation issue. Different datasets have different types of tables (different domains). We found that a model specifically fine-tuned for that particular dataset (particular domain) attains high accuracy than a generic model which was trained on multiple domains.

The exact experiment that verified this notion was : The accuracy of general model on ICDAR 19 dataset was lower than the accuracy of general model that was again fine-tuned on ICDAR 19 dataset.

And also, as per the iterative transfer learning methodology, initially a general model is trained on bigger dataset (for a generic task) and step by step it is fine-tuned on specific and smaller datasets (demanding more specific tasks).

We did not provide the ablation studies for this methodology and plan to publish our next paper on the same.

Exact details are mentioned in our paper.