Article content in each release contains huge amounts of erroneous data in content. Also, all article content is cut to 490 characters. was this dataset the one used to get benchmarks in the paper? could we have the original data?
We used the 05-01-2020 data for experiments in the arXiv paper. For the unavailable contents we used abstracts or titles instead. We only saved the first 500 chars in each article.
Article content in each release contains huge amounts of erroneous data in content. Also, all article content is cut to 490 characters. was this dataset the one used to get benchmarks in the paper? could we have the original data?