UKPLab / acl2019-BERT-argument-classification-and-clustering

Apache License 2.0
83 stars 35 forks source link

datasets #1

Closed antgr closed 4 years ago

antgr commented 5 years ago

Hi, thanks for sharing this work. I tried to run one of the scripts, but seems that datasets are missing, so I take the following error Traceback (most recent call last): File "train.py", line 711, in main() File "train.py", line 571, in main train_sampler = RandomSampler(train_data) File "/home/ant/anaconda3/envs/pytorch_full/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 94, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0

From where I am supposed to download the data?

Thanks, Antonis

antgr commented 5 years ago

OK, I read the readme instructions. The Java program displays after a while this message continuously: Warning: Couldn't find sentence '3350ec033187e0182a6ed575a99c8349' Is something wrong from my part?

antgr commented 4 years ago

@nreimers are you aware of that issue? I currently do not need to work with the code, but I am still curious if that is a known issue.

nreimers commented 4 years ago

Hi @antgr Due to copyright issues, it is sadly not possible do directly share the dataset. To by-pass this issue, it crawls the source document from archive.org.

It appears that one of the links / source document disappeared (404). Sadly I'm not the author of the dataset, so I don't know if it can be fixed.

If you are interested in this dataset, please contact me by email: Rnils@web.de

antgr commented 4 years ago

Thank you I sent you an email!