Question about table representation for tasks like NL2SQL

facebookresearch / TaBERT

This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).

Other

580 stars 63 forks source link

@pcyin Hi, I have two questions about the spider experiments.

In text2sql for DB with multiple tables, we will also need representation for each table other than columns. In the paper you mentioned the table representation is obtained via the prefix [CLS] token, do you consider the table name in this step? Right now the input to tabert does not have the table name/caption part, but did you concatenate the table name and column name? Otherwise, it seems hard to differentiate columns in different tables with the same name, e.g., name column in tables actor and director.
For the [CLS] representation, is it part of the context_encoding returned by the model as shown in the example? Seems the [CLS] token is considered as part of the context.

It would be very helpful if you could help clarify this. Thanks!

facebookresearch / TaBERT

Question about table representation for tasks like NL2SQL #16