This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).
Thanks for open-sourcing the project. In the paper, you describe Content Snapshot as using n-gram to select the top K-rows. Do you have the code of that part in the repo? I see in your current implementation, you simply randomly choose K-rows for training instead, as shown in the snippet above.
https://github.com/facebookresearch/TaBERT/blob/cf5351c697773573a4fd857e3dde7f66cc6e6dd9/table_bert/vertical/input_formatter.py#L82-L83
Thanks for open-sourcing the project. In the paper, you describe Content Snapshot as using n-gram to select the top K-rows. Do you have the code of that part in the repo? I see in your current implementation, you simply randomly choose K-rows for training instead, as shown in the snippet above.