Content Snapshot Code Missing

facebookresearch / TaBERT

This repository contains source code for the TaBERT model, a pre-trained language model for learning joint representations of natural language utterances and (semi-)structured tables for semantic parsing. TaBERT is pre-trained on a massive corpus of 26M Web tables and their associated natural language context, and could be used as a drop-in replacement of a semantic parsers original encoder to compute representations for utterances and table schemas (columns).

Other

580 stars 63 forks source link

Content Snapshot Code Missing #23

Open waytehsu opened 3 years ago

waytehsu commented 3 years ago

https://github.com/facebookresearch/TaBERT/blob/cf5351c697773573a4fd857e3dde7f66cc6e6dd9/table_bert/vertical/input_formatter.py#L82-L83

Thanks for open-sourcing the project. In the paper, you describe Content Snapshot as using n-gram to select the top K-rows. Do you have the code of that part in the repo? I see in your current implementation, you simply randomly choose K-rows for training instead, as shown in the snippet above.

pafitis commented 2 years ago

Hi — thanks for publishing this!

I was wondering if you are planning to add the content snapshot code.

Thanks for your time; very cool stuff!!