Preparing training data

facebookresearch / TaBERT

This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).

Other

580 stars 63 forks source link

Preparing training data #1

Closed psorianom closed 3 years ago

psorianom commented 4 years ago

Hi ! Thank you for your work. I find it very interesting.

I want to train a custom TaBERT using my own tables and contexts but I am not really sure how to proceed. I would like to know more about the training dataset format. Should the dataset be already linearized, as described in the paper? Could you share a sample of the dataset you used to get a glimpse of it?

Thanks for any insight you may have !

pcyin commented 3 years ago

Thanks for your interest! I'll prepare a minimal working example ASAP.

psorianom commented 3 years ago

Thank you!

monk1337 commented 3 years ago

Sharathmk99 commented 3 years ago

Vincentchhsu commented 3 years ago

Hi, i need too, thanks.

DevHyung commented 3 years ago

Hi, i need too, thanks.

pcyin commented 3 years ago

Hey!

Sorry for the delay! I've updated the repo with code for extracting tables from CommonCrawl and Wikipedia, training data generation, and model training.