Closed howard50b closed 1 year ago
Hi!!! Thank you for reminding us of the overfitting issue. Since we have released the ground truth of the test set 😫, adding a validation set seems cannot prevent tuning prompts on the test set if developers find that the performance is unsatisfactory. To prevent overfitting, we recommend that developers also report their results using our prompt, which is nearly identical across all baselines, or compare their baseline systems using the same prompt they've designed.
Got it. Thanks for the reply!
Hi, thanks for the timely resources! I have a question regarding the dataset splits -- I noticed that the dataset seems to only have a test set. Is that by design? If so how to prevent overfitting (even if it's just tuning the prompts) or is there a plan to add a validation set?
Thank you!