Closed tonyzhao6 closed 1 year ago
We swap those column names to make the order of original column names and clean column names to be same. We use the clean version to generate the relation of the schema and input, while using the original column in our input.
Thus we make sure they are in the same order for us to generate the relations conveniently.
Oh I see. For the Spider dataset, there is exactly one table (store_1
) where the elements in table_names
and table_names_original
do not have a 1:1 correspondence for some reason. Thus, we have to do a manual swap of the table names and their corresponding column names.
likewise, there are similar issues with other tables for CoSQL and SParC that have to be manually adjusted.
align_tables.py
seems to swap column names for certain databases.For example, for the Spider dataset, the column names for the
store_1
database is swapped. I understand that this is because of annotation issues from the original Spider dataset. However, I don't understand why a simple swapping solves this issue.