I read the paper and found 70.2 is task specific fine-tuned version:
Are the values of chatgpt(60.2) and gpt-4(82.3) coming from in-context learning result or fine-tuning? If they are coming from in-context learning, then I think this graph is misleading and unfair.
I read the paper and found 70.2 is task specific fine-tuned version:
Are the values of chatgpt(60.2) and gpt-4(82.3) coming from in-context learning result or fine-tuning? If they are coming from in-context learning, then I think this graph is misleading and unfair.