Reproduce the performance of baseline model in paper

Hi!

I tried to reproduce the performance of Glove Embeddings on the AFS dataset, but I failed.

The Pearson correlation and Spearman's rank correlation are .3240 and .3400 in the original paper, but I got around .15 ~.20 on both metrics with 10-fold CV.

Am I have to weight the embedding of each token by IDF, or follow additional details to reproduce the same result?

I also tried to reproduce the performance of Infersent(both of Glove and FastText) and their performance is same with paper.

Thank you!

UKPLab / acl2019-BERT-argument-classification-and-clustering

Reproduce the performance of baseline model in paper #5