I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue:
python make_datafiles.py ./cnn/stories ./dailymail/stories/
Making bin file for URLs listed in url_lists/all_test.txt...
Traceback (most recent call last):
File "make_datafiles.py", line 138, in
write_to_bin(all_test_urls, os.path.join(finished_files_dir, "test"))
File "make_datafiles.py", line 84, in write_to_bin
url_list = read_text_file(url_file)
File "make_datafiles.py", line 26, in read_text_file
with open(text_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'url_lists/all_test.txt'
Then I assume it is because all_test_urls doesn't direct to the url file in the dataset, i.e., wayback_test_urls.txt. So, I alter the file name to all_test.txt and put it in the folder, ./cnn/url_lists . But the code still gives the same error. So, I check the source again and find something wrong in the following line.
url_list = read_text_file(url_file)
And I alter it to be:
url_list = read_text_file(os.path.join('./cnn', url_file))
In this way, I think all the source and target file is generated from only cnn dataset. Am I right?
I run make_datafiles.py to generate raw text file for BART preprocessing, but I meet following issue: