Lillianwei-h / CToT

Code release for paper Generating Chain-of-Thoughts with a Direct Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought
https://arxiv.org/abs/2402.06918
0 stars 0 forks source link

Consultation #1

Open cyf20 opened 1 month ago

cyf20 commented 1 month ago

Hello author, is this code complete? I am very interested in this paper and code, and would like to study it, is that OK? Please reply me if you see it

Lillianwei-h commented 1 month ago

Yes, the code is complete as long as you put your OpenAi api key in api_key.yaml. But be aware that CToT costs much more tokens than normal CoT.

cyf20 commented 1 month ago

Thank you very much for your response. Could you please let me know how high the CTOT cost is? May I reach out to you again if I have any further questions? Once again, I appreciate your reply!

Lillianwei-h commented 1 month ago

It costs about 30 to 50 times as much as CoT, depending on your parameters(including n_select_sample and max_round). But by using the latest OpenAI models, it may cost less.

cyf20 commented 1 month ago

Thank you once again for your reply. Much appreciated!

Lillianwei-h commented 1 month ago

You are welcome~

cyf20 commented 4 weeks ago

Hello, author. May I ask if it's possible for us to exchange contact information? I have a few questions I'd like to ask for your guidance. Thank you very much!

cyf20 commented 4 weeks ago

Hello, author. In the game24 experiment, I did not see the accuracy being generated. In the AQUA experiment, after running it several times, I found that the accuracy differs significantly from what is reported in the paper. Could you please advise if there might be an issue here? I would appreciate your guidance. Thank you!

Lillianwei-h commented 4 weeks ago

For the first question, you need to check the filtered answer in ./outcomes dir and manually count the accuracy. For the second question, you need to modify the comparison prompts to few shots, for now they are simplified as templates for general tasks. You may also change line 301 to 304 in tot.py to continue_zs=continue_zs+remained_zs, but this will largely increase token usage. I am sorry for the inconvenience.

cyf20 commented 3 weeks ago

Hello, author. I'm here to bother you again, haha. In this paper, where are the Standard method and Dueling method reflected in the code? I haven't been able to find them. Is the Standard method shown within the compare function? I am currently studying AQUA.

CYiF? @.***

 

------------------ 原始邮件 ------------------ 发件人: "Lillianwei-h/CToT" @.>; 发送时间: 2024年9月22日(星期天) 上午8:12 @.>; @.**@.>; 主题: Re: [Lillianwei-h/CToT] Consultation (Issue #1)

For the first question, you need to check the filtered answer in ./outcomes dir and manually count the accuracy. For the second question, you need to modify the comparison prompts to few shots, for now they are simplified as templates for general tasks. You may also change line 301 to 304 in tot.py to continue_zs=continue_zs+remained_zs, but this will largely increase token usage. I am sorry for the inconvenience.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

cyf20 commented 3 weeks ago

Hello,author, the cost of running the AQUA dataset is really high—just running it once costs nearly $20. I adjusted the parameters according to the experimental settings in the paper, but I found that the accuracy is still significantly lower than what was presented in the paper. What could be the reason for this? Could you please provide some guidance? Is it possible to exchange contact information? I would greatly appreciate it! Thank you!

Lillianwei-h commented 3 weeks ago

Hi, you can contact me via my email lillianwei423@gmail.com. Thank you for your interest in our work!

Lillianwei-h commented 3 weeks ago

The code we released is the standard version, even though the function name is dueling hhhh, and we'll change this error in the next release.