RUCKBReasoning / codes

The source code of CodeS (SIGMOD 2024).
https://arxiv.org/abs/2402.16347
Apache License 2.0
140 stars 21 forks source link

Spider results #4

Closed lwmlyy closed 1 year ago

lwmlyy commented 1 year ago

Awesome work!

You mention "For Spider, we adopt the execution accuracy (EX) and test-suite accuracy (TS) as the evaluation metrics.", I wonder what is the difference between these two metrics. Are they both obtained with the evaluation code in spider?

lihaoyang-ruc commented 1 year ago

Q: I wonder what is the difference between these two metrics. A: EX and TS are distinct evaluation metrics, with TS offering greater reliability by significantly minimizing false positives commonly found in EX, thanks to the incorporation of "test suites". For an in-depth understanding, I highly recommend the paper, 'Semantic Evaluation for Text-to-SQL with Distilled Test Suites,' available at this link.

Q: Are they both obtained with the evaluation code in spider? A: While EX and TS employ the same evaluation code, they differ in the databases used in the evaluation. EX utilizes databases officially supplied by the Spider benchmark, whereas TS employs a set of test suites developed and released by Ruiqi Zhong et al.