LUMIA-Group / rasat

The official implementation of the paper "RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL"(EMNLP 2022)
https://arxiv.org/abs/2205.06983
Apache License 2.0
63 stars 18 forks source link

Why is table alignment necessary? #13

Closed tonyzhao6 closed 1 year ago

tonyzhao6 commented 1 year ago

align_tables.py seems to swap column names for certain databases.

For example, for the Spider dataset, the column names for the store_1 database is swapped. I understand that this is because of annotation issues from the original Spider dataset. However, I don't understand why a simple swapping solves this issue.

Monstarrr commented 1 year ago

We swap those column names to make the order of original column names and clean column names to be same. We use the clean version to generate the relation of the schema and input, while using the original column in our input.

Thus we make sure they are in the same order for us to generate the relations conveniently.

tonyzhao6 commented 1 year ago

Oh I see. For the Spider dataset, there is exactly one table (store_1) where the elements in table_names and table_names_original do not have a 1:1 correspondence for some reason. Thus, we have to do a manual swap of the table names and their corresponding column names.

likewise, there are similar issues with other tables for CoSQL and SParC that have to be manually adjusted.