RUCKBReasoning / codes

The source code of CodeS (SIGMOD 2024).
https://arxiv.org/abs/2402.16347
Apache License 2.0
128 stars 18 forks source link

Regarding schema filtering #19

Closed Hari-Dorbala closed 1 week ago

Hari-Dorbala commented 3 months ago

Hi. I am trying to understand how to use the schema filtering part. From the given entire schema of the database, how can I get the relevant tables and columns required for a given NL query? How should I prepare my data to get there? I have seen that the RoBERTa is used in schema_filter.py, but I cannot understand how the data needs to be prepared (I have done labeling, but the data is highly imbalanced; how do I approach this problem?) Can you please explain how schema filtering is handled in CodeS?

lihaoyang-ruc commented 3 months ago

The schema filter is not a part of the pre-trained CodeS model. We have released it as an independent component. For further details, please visit the GitHub repository at: https://github.com/RUCKBReasoning/text2sql-schema-filter

For the problem of training label imbalance, please refer to our other paper RESDSQL(https://arxiv.org/abs/2302.05965). Alleviating the bias by using focal loss.