fgenie / rims_minimal

Been lazy enough to pull over again to the end!
0 stars 1 forks source link

[한가지 method만 사용하였을 경우 데이터셋 별 성능] #24

Closed fgenie closed 4 months ago

fgenie commented 5 months ago

chatGPT RESULTS

GSM

cot

1037 / 1319 (78.6%)

pal

1061 / 1319 (80.4%)

p2c

1003 / 1319 (76.0%)

SVAMP

cot

830 / 1000 (83.0%)

pal

841 / 1000 (84.1%)

p2c

835 / 1000 (83.5%)

MATH

cot

364 / 871 (41.8%)

pal

470 / 871 (54.0%)

p2c

457 / 871 (52.5%)

fgenie commented 5 months ago

GPT4TURBO RESULTS

MATH

cot

482 / 871 (55.3%)

pal

577 / 871 (66.2%)

p2c

589 / 871 (67.6%)

SVAMP

cot

919 / 1000 (91.9%)

pal

944 / 1000 (94.4%)

p2c

948 / 1000 (94.8%)

GSM

cot

1210 / 1319 (91.7%)

pal

1238 / 1319 (93.9%)

p2c

1238 / 1319 (93.9%)