There is a problem in the paper to demonstrate step-dpo effectiveness: The square root of t is greater than 2 and less than 3.5. How many integer values of t satisfy this condition? However I change the prompt as "The square root of t is greater than 2.3 and less than 3.5. How many integer values of t satisfy this condition?" The answer is like
There is a problem in the paper to demonstrate step-dpo effectiveness: The square root of t is greater than 2 and less than 3.5. How many integer values of t satisfy this condition? However I change the prompt as "The square root of t is greater than 2.3 and less than 3.5. How many integer values of t satisfy this condition?" The answer is like