euagendas / semeval_8_2022_ia_downloader

internet archive downloader for task 8 at semeval
Other
7 stars 3 forks source link

Update README with clear how to get started for the SemEval task #5

Open computermacgyver opened 1 year ago

computermacgyver commented 1 year ago

Based on the comment from @intifa233 in #4 . https://github.com/euagendas/semeval_8_2022_ia_downloader/issues/4#issuecomment-1315668622

Hi, This question may be stupid, I am just a beginner at python. I created a new environment successfully installed the requirements.txt. Also the downloader by "pip install semeval_8_2022_ia_downloader". When I used "python -m semeval_8_2022_ia_downloader.cli --links_file=input.csv --dump_dir=output_dir", it said "FileNotFoundError: [Errno 2] No such file or directory: 'input.csv'". Would you please tell me what should I do? Thank you!

computermacgyver commented 1 year ago

The error above indicates that the file input.csv does not exist. The idea is to change the command and specific a file of URLs that should be downloaded. For example, you can download semeval-2022_task8_train-data_batch.csv from the dataset on Zenodo and then update the Python command to be

python -m semeval_8_2022_ia_downloader.cli --links_file=semeval-2022_task8_train-data_batch.csv --dump_dir=output_dir"

This will read the urls from semeval-2022_task8_train-data_batch.csv and put the output in a new directory (folder) called output_dir.

The full Zenodo data set is at https://zenodo.org/record/6507872#.Y3QhD1XP0WU

intifa233 commented 1 year ago

Thanks a lot! That is really helpful!