Open BeachWang opened 1 year ago
Besides, I am confuse about that DIN-SQL have similar Exact match Accs in Table 2 and Table 3 but two significant differences Exec Accs.
Thank you so much for pointing these out. First, the exec acc method we are using to evaluate our model is the official metric published here: https://github.com/taoyds/test-suite-sql-eval. This metric which is called "Exec acc" is actually computing the test suite accuracy as also stated in the repo "This repo contains test suite evaluation metric ". Thus we compared our method with the work of (Liu et al., 2023a) in terms of test-suite accuracy and their reported test-suite accuracy is 60.1. Second, table 2 contains the results of our method on the test set of spider and table 3 has the results on the dev set of spider.
I use the official metric to eval the result on dev set you publish in the GPT4_results file and the results of Exec acc are 85.1 for DIN-SQL and 80.1 for few-shot. Maybe you have used the different metric in Table 2 and Table 3 I guess?
That's interesting, maybe there is a problem with the script we are using. thank you so much for letting us know.
I also got a different score about GPT4_results.
@MohammadrezaPourreza Could I know what your script is?
I use https://github.com/taoyds/test-suite-sql-eval and follow it's steps.
I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec --plug_value
I got these:
if I run a script like that:
python3 evaluation.py --gold ./my_test/gold_example.txt --pred ./my_test/din_sql_pred_sql.txt --db ./database/--etype exec
and, I got these:
Both of 0.863 and 0.828 are different from your paper's result.
I'm so curious which part I run wrongly.
Thanks!
It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.
Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.
It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.
Thank you for your replying! I will wait for your result.
It's interesting for me as well, many people told me they got different results on the dev set, and even among those the results were not consistent. We are trying to figure out where is the problem.
Thank you for your replying! I will wait for your result.
哥们帮我看看我的问题可以吗? 运行后一直这样,不能输入
大概是网络问题
大概是网络问题
我链接了梯子,还是不行。为什么呢
全局开了吗?或者你可以试试用可以国内转发的代理
全局开了吗?或者你可以试试用可以国内转发的代理
开的全局
Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.
哥,可以加个微信帮我看看问题吗?15523313206 感激不尽
DIN-SQL用的是GPT4,你有GPT4的API key吗?
-----原始邮件----- 发件人:"Shi Xiang Xiang" @.> 发送时间:2023-06-12 17:02:57 (星期一) 收件人: MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting @.> 抄送: BeachWang @.>, Author @.> 主题: Re: [MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting] About the Exec Acc in your paper (Issue #7)
Thank you for your great work @MohammadrezaPourreza. I got the same number of 82.8 as @amity871028. I am using the EX accuracy obtained by https://github.com/taoyds/test-suite-sql-eval. I think this is also used as the evaluation script for Spider-test. I am guessing that you were using the https://github.com/taoyds/spider/blob/master/evaluation.py for EX accuracy which always generates a number that is a bit lower than that from the test-suite. Not sure if I am right so ignore it if my guess is wrong.
哥,可以加个微信帮我看看问题吗?15523313206 感激不尽
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
@MohammadrezaPourreza I also got different results when evaluated the https://github.com/MohammadrezaPourreza/Few-shot-NL2SQL-with-prompting/blob/main/GPT4_results/DIN-SQL.csv! Is there any update on this issue?
This is how I formatted the files for evaluation:
din_sql_gold_evalformat.csv din_sql_prediction_evalformat.csv
My command:
test-suite-sql-eval-master\evaluation.py --gold din_sql_gold_evalformat.csv --pred din_sql_prediction_evalformat.csv --etype exec --db .\database
I find that Liu show the Exec Acc is 70.1 in their (Liu et al., 2023a), but there is 60.1 in your paper. Is it a mistake here? Do you have used the same evaluation codes in Exec?