This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).
I found this work really interesting and the dataset collected and cleaned can be really helpful for the research community.
Do you plan to release the 26M Web tables dataset publically?
Thanks