google-research / tapas

End-to-end neural table-text understanding models.
Apache License 2.0
1.13k stars 216 forks source link

Generating Pre-Training Data for TAPAS #174

Open DominikKowieski opened 1 year ago

DominikKowieski commented 1 year ago

Hello,

I am trying to redo the whole training process with German data. I already collected data for the fine-tuning process but struggle to understand on how the pre-training data is obtained. Based on this link (https://github.com/google-research/tapas/blob/9f2163958d1a6ffa15b9ac346eebe0a140460fb9/PRETRAIN_DATA.md) I understand one has to extract data in the proto text format and then convert it into TF examples with the "tapas/create_pretrain_examples_main.py" script. Now I'm having difficulty understanding how this data was obtained, especially on how to fill the question keys with values. Am I missing something? Thanks in advance.