accuracy only 0.659 when use gpt-3.5 - Githubissues

MohammadrezaPourreza / Few-shot-NL2SQL-with-prompting

MIT License

301 stars 59 forks source link

accuracy only 0.659 when use gpt-3.5 #13

Closed justforsoy closed 1 year ago

justforsoy commented 1 year ago

Since GPT-4 is expensive and Codex is deprecated. I use gpt-3.5 to test this method. Then I got the score: easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY ===================== execution 0.742 0.720 0.546 0.488 0.659

It's much bad then GPT-4 or Codex. Do yu have any ideas to make it better?

justforsoy commented 1 year ago

Then I tested the Zero-shot: easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY ===================== execution 0.899 0.798 0.655 0.506 0.751

I’m confused……

justforsoy commented 1 year ago

And few-shot: easy medium hard extra all
count 248 446 174 166 1034
===================== EXECUTION ACCURACY ===================== execution 0.895 0.800 0.615 0.404 0.728

ShiXiangXiang123 commented 1 year ago

请问你是怎么改称3.5的是直接改DIN-SQL.py文件的第444行的key吗？

justforsoy commented 1 year ago

请问你是怎么改称3.5的是直接改DIN-SQL.py文件的第444行的key吗？ def GPT4_generation(prompt): response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], n = 1, stream = False, temperature=0.0, max_tokens=600, top_p = 1.0, frequency_penalty=0.0, presence_penalty=0.0, stop = ["Q:"] ) return response['choices'][0]['message']['content']

change the "gpt-4" to "gpt-3.5-turbo"

ShiXiangXiang123 commented 1 year ago

谢谢，我也这么改了，感谢你的回复。

ShiXiangXiang123 commented 1 year ago

谢谢，我也这么改了，感谢你的回复。

ShiXiangXiang123 commented 1 year ago

谢谢，我也这么改了，感谢你的回复。

ShiXiangXiang123 commented 1 year ago

请问你是怎么改称3.5的是直接改DIN-SQL.py文件的第444行的key吗？ def GPT4_generation(prompt): response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], n = 1, stream = False, temperature=0.0, max_tokens=600, top_p = 1.0, frequency_penalty=0.0, presence_penalty=0.0, stop = ["Q:"] ) return response['choices'][0]['message']['content']

change the "gpt-4" to "gpt-3.5-turbo"

ShiXiangXiang123 commented 1 year ago

请问你是怎么改称3.5的是直接改DIN-SQL.py文件的第444行的key吗？ def GPT4_generation(prompt): response = openai.ChatCompletion.create( model="gpt-4", messages=[{"role": "user", "content": prompt}], n = 1, stream = False, temperature=0.0, max_tokens=600, top_p = 1.0, frequency_penalty=0.0, presence_penalty=0.0, stop = ["Q:"] ) return response['choices'][0]['message']['content']

change the "gpt-4" to "gpt-3.5-turbo"

我这个一直这样不动了，可以看看你的吗？我改了很久都没找到原因，也不能输入。

ShiXiangXiang123 commented 1 year ago

大佬可以加你的微信或者qq么，帮我看看求你了。

ShiXiangXiang123 commented 1 year ago

api换成这个就行对吧，我这个是私人key可以吗？

justforsoy commented 1 year ago

Since GPT-4 is expensive and Codex is deprecated. I use gpt-3.5 to test this method. Then I got the score: easy medium hard extra all count 248 446 174 166 1034 ===================== EXECUTION ACCURACY ===================== execution 0.742 0.720 0.546 0.488 0.659

It's much bad then GPT-4 or Codex. Do yu have any ideas to make it better?

I find the problem, gpt3.5-turbo's max token size is 4,096. And the prompt is too long for gpt3.5. gpt-4: 8,192 tokens, code-davinci-002: 8,001 tokens