Open Lukecn1 opened 2 years ago
Hi @Lukecn1 . Unfortunately copyright law prevents us from sharing the news articles directly :disappointed: Most articles are available on the Internet Archive, and the code should automatically try to download from there. The sample data was created earliest in the project before we started ensuring articles were on the Internet Archive; so, although the sample data may be missing most of the actual articles used in the SemEval competition should be available.
Thats fair, I hadn't considered the copyright aspect.
I experienced the same issue when scraping the evaluation dataset however.
I will try from scratch again, and see of maybe its an issue on my end.
Hi, This question may be stupid, I am just a beginner at python. I created a new environment successfully installed the requirements.txt. Also the downloader by "pip install semeval_8_2022_ia_downloader". When I used "python -m semeval_8_2022_ia_downloader.cli --links_file=input.csv --dump_dir=output_dir", it said "FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'". Would you please tell me what should I do? Thank you!
Welcome @intifa233 . All questions are good ones. I'm opening a separate issue to discuss this. Please see #5
However I have issues in downloading the data as there are many of the links that are no longer working and therefore cannot be scraped.
This is even true for the sample_data.csv, where a large percentage is missing one or both articles in the pair.
Are you able to share the evaluation dataset privately?