Any plans to release the 26M Web tables?

facebookresearch / TaBERT

This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).

Other

580 stars 63 forks source link

Any plans to release the 26M Web tables? #12

Closed sksq96 closed 3 years ago

sksq96 commented 3 years ago

I found this work really interesting and the dataset collected and cleaned can be really helpful for the research community.

Do you plan to release the 26M Web tables dataset publically?

Thanks

pcyin commented 3 years ago

Hey!

Sorry for the delay! I've released the code for table extraction and generating training data. Hope this helps!