Open alomrani opened 10 months ago
Hi,
I am experiencing a similar issue, with my results hovering around 0.69 for WebQSP and 0.37 for CWQ. I would greatly appreciate it if the authors could provide some insight into the challenges of reproducing the results.
Best regards, Liyi
Hi,
I am also facing the same issue. I run the experiment for CWQ twice and got around 37% accuracy for gpt-3.5, compared to 57.1% mentioned in the paper. Could you please provide some suggestions in reproducing the result of the paper.
Best, Qifan
Hi, Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:
eval.py
file has some problems, and we will fix them as soon as possible.alias
that we built and will be updated later.Thank you very much for your reply! I have already corrected the retrieval code and adjusted the version of ChatGPT. However, my experimental results did not improve much and are similar to the previous ones. I hope the alias file can be provided and the eval file can be corrected for reproduction as soon as possible.
Best, Liyi
Hi, Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:
- The current version of the
eval.py
file has some problems, and we will fix them as soon as possible.- The chatgpt model we use is gpt-3.5-turbo-0613, and the performance may fluctuate slightly from the current updated model.
- CWQ test is a file with
alias
that we built and will be updated later.
Which version of GPT-4 did you use?
Hi, Sorry for the late reply, we did not save the previous results, but here are some tips to reproduce the results of the paper:
- The current version of the
eval.py
file has some problems, and we will fix them as soon as possible.- The chatgpt model we use is gpt-3.5-turbo-0613, and the performance may fluctuate slightly from the current updated model.
- CWQ test is a file with
alias
that we built and will be updated later.Which version of GPT-4 did you use? Hi,
We use gpt-4-0613 for all the experiments setting.
非常感谢您的回复!我已经更正了检索代码并调整了 ChatGPT 的版本。然而,我的实验结果并没有太大的改善,并且与以前的结果相似。我希望可以提供别名文件,并且可以尽快更正 eval 文件以进行复制。
最好的,丽艺
你好,我在复现代码的过程中遇到了一些困难,你能指点我一下吗 感谢
非常感谢您的回复!我已经更正了检索代码并调整了 ChatGPT 的版本。然而,我的实验结果并没有太大的改善,并且与以前的结果相似。我希望可以提供别名文件,并且可以尽快更正 eval 文件以进行复制。 最好的,丽艺
你好,我在复现代码的过程中遇到了一些困难,你能指点我一下吗 感谢
你好,我也在复现过程中遇到了一些问题,可以一起交流一下吗?非常感谢
Thank you very much for your reply! I have already corrected the retrieval code and adjusted the version of ChatGPT. However, my experimental results did not improve much and are similar to the previous ones. I hope the alias file can be provided and the eval file can be corrected for reproduction as soon as possible.
Best, Liyi
Hi,
I am facing the same issues with results 0.69 for WebQSP after correcting the eval codes. Did you solve the problem? Any guidance you can provide would be greatly appreciated.
Best, Zhoutian
Hi there,
After running your code, I am getting 67 EM for WebQSP with gpt-3.5-turbo compared to 76 EM reported in the paper. I was wondering if you can share your results file for comparison.
Thanks, Mohammad