Open yigengjiang opened 1 week ago
Hi Yigengjiang,
Thank you for spotting the error and sorry for any inconvenience.
I have checked the code and there is a typo in the prompt construction causing this issue. I have already fixed it and tried the first 30 instances and got an accuracy of 70%. Please try again and you should get a similar result.
Thanks.
Thank you for your response! I have another question regarding the ablation study. Could you clarify why there wasn't an ablation study conducted with and without the translator module? It seems that the translator plays a pivotal role in your method.
Hi Yigengjiang,
There is an ablation conducted with and without the translator. We mentioned this in section 4.3 model ablation that the translator contributes an improvement of 6.3% in average. If you look at Fig 3 in the paper, this contribution of the translator is calculated through the number on the grey bar (SymbCoT without planner&solver&verifier which is just a translator) minus the number on the pink bar (SymbCoT without translator&planner&solver&verifier which is without translator and any other modules).
Thanks.
Description
I encountered an issue in the code when running the evaluation script. Below are the details of the issue and the steps I took to investigate and attempt a fix.
Steps to Reproduce
When I execute the following
evaluate.sh
script:Outputs
The script outputs:
The total records are only 104 because the use of gpt-3.5-turbo during logic inference exceeded the budget.
Attempted Fix
I modified the following line in
evaluate.py
:After modifying the code, I executed
evaluate.sh
again, resulting in:Even after the modification, the accuracy is still significantly lower (24.04%) compared to the result reported in the paper (75.8%), despite the smaller sample size of 104 records. The discrepancy seems too large to be accounted for solely by the sample size.
Request
Could you please investigate this issue further? It seems there might be an underlying problem affecting the evaluation accuracy.
Thank you for your assistance.