Open dvirginz opened 4 years ago
We can't upload the data ourselves, but you can get access to it here. For the code, we are busy right now with the EMNLP deadline but can upload it afterwards.
Definitely! Good luck with EMNLP:) Whenever you'll have the time:)
+1 on looking forward to Hyperpartisan code & hope EMNLP goes well!
In the paper 8 of this paper, it is said that " For Hyperpartisan we split the training data into train/dev/test sets using standard 90/10/10 splits", I think there is a typo error about the split ratio.
I think the correct description should be the training data is splited into train/test sets using 90/10 splits, am I right?
Another question is it is said that 'For Hyperpartisan we, ..., performed each experiment five times with different seeds to control variability associated with the small dataset', so how the final f1-score calculated?
Is the final score calculated by mean of best score of each run, or mean of last score of each run?
@OleNet Re splits: Yes, 90/10/10 was a typo. We meant 80/10/10 (10% for dev and 10% for test). Re final F1: Final F1 score was calculated based on the mean of the test F1 scores from the each run. For each run we evaluated the checkpoint with the best dev performance.
Got it, Thanks for your replying!
Hi, I tested RoBERTa on randomly split Hyperpartisan dataset (80/10/10). The F1 score is 0.734370 (0.047247) for 6 runs. And I found the 1st team on leaderborad only got 0.809.
The above scores are much lower than the one reported in Longformer paper (0.874 for RoBERTa), and it puzzled me a lot. I guess the problem is due to the small size of Hyperpartisan dataset (only 645 samples).
Could you kindly provide the final train/dev/test data for this dataset? I think it's a key step to get consistent result and make fair comparison with your model.
As described in the paper we split the original "training" set of this dataset into 3 parts. The dataset is small and using different splits could change the results considerably. We also did some preprocessing/cleaning on this data that could affect the results. We've added instructions, a preprocessing script and the exact splits we used. Please check out this PR: https://github.com/allenai/longformer/pull/112
Great paper, and really clean and explainable repo, Thanks! Any plans to release the Hyperpartisan dataset and benchmark utils? It could really help future researchers go through your pipeline of cleaning and evaluating the data.
Thanks!