Open wangzhihao-coder opened 1 month ago
我也遇到了同样的问题
Hi, same for me, for both 1.5B and 7B. My numbers so far: | Model | MATH CoT | MATH CoT (maj@8) | MATH TIR |
---|---|---|---|---|
Qwen2.5-1.5B-Instruct | 0.7646 | N/A | 0.7534 | |
Qwen2.5-7B-Instruct | 0.8408 | 0.8804 | 0.8088 |
The CoT numbers are a bit higher than reported because I used MARIO_EVAL to compare results with ground truth. I'm using vLLM for inference, with top_p=1.0
and temperature=0
.
I checked a few failure cases and they seemed strange to me. For example, problem 50 in MATH-test benchmark. The model used a function sympy.complete_square
, which clearly does not exist:
<|im_start|>system
Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \boxed{}.<|im_end|>
<|im_start|>user
Find the radius of the circle with equation $x^2 + 8x + y^2 - 6y = 0$.<|im_end|>
<|im_start|>assistant
To find the radius of the circle given by the equation \(x^2 + 8x + y^2 - 6y = 0\), we need to rewrite the equation in the standard form of a circle's equation \((x - h)^2 + (y - k)^2 = r^2\). This process is called completing the square.
Let's break it down step-by-step:
1. Start with the given equation:
\[
x^2 + 8x + y^2 - 6y = 0
\]
2. Complete the square for the \(x\) terms. The coefficient of \(x\) is 8, so we take half of 8 (which is 4) and square it (which is 16). Add and subtract 16 inside the equation:
\[
x^2 + 8x + 16 - 16 + y^2 - 6y = 0
\]
This can be rewritten as:
\[
(x + 4)^2 - 16 + y^2 - 6y = 0
\]
3. Next, complete the square for the \(y\) terms. The coefficient of \(y\) is -6, so we take half of -6 (which is -3) and square it (which is 9). Add and subtract 9 inside the equation:
\[
(x + 4)^2 - 16 + y^2 - 6y + 9 - 9 = 0
\]
This can be rewritten as:
\[
(x + 4)^2 + (y - 3)^2 - 25 = 0
\]
4. Finally, move the constant term to the right side of the equation:
\[
(x + 4)^2 + (y - 3)^2 = 25
\]
Now the equation is in the standard form \((x - h)^2 + (y - k)^2 = r^2\), where \((h, k)\) is the center of the circle and \(r\) is the radius. From the equation \((x + 4)^2 + (y - 3)^2 = 25\), we can see that the center of the circle is \((-4, 3)\) and the radius \(r\) is \(\sqrt{25} = 5\).
So, the radius of the circle is \(5\).
Let's confirm this with Python code.
```python
import sympy as sp
# Define the variables
x, y = sp.symbols('x y')
# Define the equation of the circle
equation = x**2 + 8*x + y**2 - 6*y
# Complete the square for x and y terms
x_term = sp.complete_square(x**2 + 8*x, full=True)[0]
y_term = sp.complete_square(y**2 - 6*y, full=True)[0]
# Rewrite the equation
completed_square_eq = x_term + y_term - sp.complete_square(x**2 + 8*x, full=True)[1] - sp.complete_square(y**2 - 6*y, full=True)[1]
# Simplify the equation
completed_square_eq = sp.simplify(completed_square_eq)
# Extract the radius
radius_squared = completed_square_eq.rhs
radius = sp.sqrt(radius_squared)
print(radius)
Does Qwen team has an in-house version of Sympy that contains functions not existing in the main repo?
Python code could be hallucinated. That explains function that cannot be found?
maybe check this: https://github.com/QwenLM/Qwen-Agent/blob/a9ef165971d37ac37ec1df9e565ca46a38b0afab/examples/tir_math.py
# We use the following two systems to distinguish between COT mode and TIR mode
TIR_SYSTEM = """Please integrate natural language reasoning with programs to solve the problem above, and put your final answer within \\boxed{}."""
COT_SYSTEM = """Please reason step by step, and put your final answer within \\boxed{}."""
'The dependencies for Python Executor support are not installed. '
'Please install the required dependencies by running: pip install "qwen-agent[python_executor]"') from e
The same problem here. For 7B-instruct, I got 77% on GSM8K with TIR and 95.6% with CoT.
我参考TIR的prompt在qwen2.5-Math的1.5B和7B模型上进行了实验,得到的指标结果比COT差,我怀疑我的实现缺少了一些步骤,能说明下更详细的实现方式嘛? 我参考下面的prompt实现了TIR