allenai / scitldr

https://scitldr.apps.allenai.org/
Apache License 2.0
746 stars 84 forks source link

Error Running make_datafiles.sh #13

Open Yatin97hrc opened 3 years ago

Yatin97hrc commented 3 years ago

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Running cd SciTLDR-Data export TASK=SciTLDR-A chmod +x make_datafiles.sh ./make_datafiles.sh # BPE preprocess

Error `usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR] [--num_cores NUM_CORES] data_dir to_stories.py: error: the following arguments are required: data_dir usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR] [--finished_files_dir FINISHED_FILES_DIR] make_datafiles.py: error: argument --finished_files_dir: expected one argument --2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘encoder.json’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘vocab.bpe’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘dict.txt’ not modified on server. Omitting download.

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/train.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/train.target' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/val.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/val.target' usage: fairseq-preprocess [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT] [--azureml-logging] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--reset-logging] [--suppress-crashes] [--use-plasma-view] [--plasma-path PLASMA_PATH] [--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}] [--tokenizer {space,moses,nltk}] [--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}] [--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}] [--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}] [--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}] [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK] [-s SRC] [-t TARGET] [--trainpref FP] [--validpref FP] [--testpref FP] [--align-suffix FP] [--destdir DIR] [--thresholdtgt N] [--thresholdsrc N] [--tgtdict FP] [--srcdict FP] [--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN] [--dataset-impl FORMAT] [--joined-dictionary] [--only-source] [--padding-factor N] [--workers N] fairseq-preprocess: error: argument --destdir: expected one argument usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir build_ctrl_datasets.py: error: the following arguments are required: datadir Times to run script: 2.3285547892252606e-06 min Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt... Traceback (most recent call last): File "make_datafiles.py", line 117, in write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test")) File "make_datafiles.py", line 76, in write_to_bin url_list = read_text_file(url_file) File "make_datafiles.py", line 17, in read_text_file with open(text_file, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target' 2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60) 2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types Traceback (most recent call last): File "/usr/local/bin/fairseq-preprocess", line 33, in sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')()) File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main main(args) File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main make_all(args.source_lang, src_dict) File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers) File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers) File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset offsets = Binarizer.find_offsets(input_file, num_workers) File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`

maruthi-syn-ack commented 3 years ago

It is expecting an argument...

to_stories.py: error: the following arguments are required: data_dir usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR] [--finished_files_dir FINISHED_FILES_DIR] make_datafiles.py: error: argument --finished_files_dir: expected one argument

SumanthSrungavarapu commented 1 year ago

Hello authors, awesome work!! I am trying to replicate your work for my project as mentioned. However, I am stuck at the below code snippet, please see if you can help.

Running cd SciTLDR-Data export TASK=SciTLDR-A chmod +x make_datafiles.sh ./make_datafiles.sh # BPE preprocess

Error `usage: to_stories.py [-h] [--mapping_dir MAPPING_DIR] [--out_dir OUT_DIR] [--num_cores NUM_CORES] data_dir to_stories.py: error: the following arguments are required: data_dir usage: make_datafiles.py [-h] [--stories_dir STORIES_DIR] [--urldir URLDIR] [--finished_files_dir FINISHED_FILES_DIR] make_datafiles.py: error: argument --finished_files_dir: expected one argument --2021-03-28 10:36:22-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/encoder.json Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘encoder.json’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/vocab.bpe Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘vocab.bpe’ not modified on server. Omitting download.

--2021-03-28 10:36:23-- https://dl.fbaipublicfiles.com/fairseq/gpt2_bpe/dict.txt Resolving dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)... 104.22.75.142, 104.22.74.142, 172.67.9.4, ... Connecting to dl.fbaipublicfiles.com (dl.fbaipublicfiles.com)|104.22.75.142|:443... connected. HTTP request sent, awaiting response... 304 Not Modified File ‘dict.txt’ not modified on server. Omitting download.

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/train.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/train.target' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/val.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/val.target' usage: fairseq-preprocess [-h] [--no-progress-bar] [--log-interval LOG_INTERVAL] [--log-format {json,none,simple,tqdm}] [--tensorboard-logdir TENSORBOARD_LOGDIR] [--wandb-project WANDB_PROJECT] [--azureml-logging] [--seed SEED] [--cpu] [--tpu] [--bf16] [--memory-efficient-bf16] [--fp16] [--memory-efficient-fp16] [--fp16-no-flatten-grads] [--fp16-init-scale FP16_INIT_SCALE] [--fp16-scale-window FP16_SCALE_WINDOW] [--fp16-scale-tolerance FP16_SCALE_TOLERANCE] [--min-loss-scale MIN_LOSS_SCALE] [--threshold-loss-scale THRESHOLD_LOSS_SCALE] [--user-dir USER_DIR] [--empty-cache-freq EMPTY_CACHE_FREQ] [--all-gather-list-size ALL_GATHER_LIST_SIZE] [--model-parallel-size MODEL_PARALLEL_SIZE] [--quantization-config-path QUANTIZATION_CONFIG_PATH] [--profile] [--reset-logging] [--suppress-crashes] [--use-plasma-view] [--plasma-path PLASMA_PATH] [--criterion {sentence_ranking,wav2vec,model,label_smoothed_cross_entropy,latency_augmented_label_smoothed_cross_entropy,legacy_masked_lm_loss,nat_loss,ctc,label_smoothed_cross_entropy_with_alignment,cross_entropy,sentence_prediction,composite_loss,masked_lm,adaptive_loss,vocab_parallel_cross_entropy}] [--tokenizer {space,moses,nltk}] [--bpe {bytes,gpt2,byte_bpe,sentencepiece,bert,characters,hf_byte_bpe,fastbpe,subword_nmt}] [--simul-type {hard_aligned,infinite_lookback,waitk,waitk_fixed_pre_decision,hard_aligned_fixed_pre_decision,infinite_lookback_fixed_pre_decision}] [--optimizer {nag,adafactor,adam,composite,adagrad,adamax,sgd,cpu_adam,lamb,adadelta}] [--lr-scheduler {cosine,pass_through,polynomial_decay,reduce_lr_on_plateau,inverse_sqrt,tri_stage,triangular,manual,fixed}] [--scoring {sacrebleu,bleu,wer,chrf}] [--task TASK] [-s SRC] [-t TARGET] [--trainpref FP] [--validpref FP] [--testpref FP] [--align-suffix FP] [--destdir DIR] [--thresholdtgt N] [--thresholdsrc N] [--tgtdict FP] [--srcdict FP] [--nwordstgt N] [--nwordssrc N] [--alignfile ALIGN] [--dataset-impl FORMAT] [--joined-dictionary] [--only-source] [--padding-factor N] [--workers N] fairseq-preprocess: error: argument --destdir: expected one argument usage: build_ctrl_datasets.py [-h] [--outdir OUTDIR] datadir build_ctrl_datasets.py: error: the following arguments are required: datadir Times to run script: 2.3285547892252606e-06 min Making bin file for URLs listed in /ctrl/mapping/mapping_test.txt... Traceback (most recent call last): File "make_datafiles.py", line 117, in write_to_bin(all_test_urls, args.stories_dir, os.path.join(args.finished_files_dir, "test")) File "make_datafiles.py", line 76, in write_to_bin url_list = read_text_file(url_file) File "make_datafiles.py", line 17, in read_text_file with open(text_file, "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/mapping/mapping_test.txt' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.target' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.source' Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 129, in main() File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in main for input in args.inputs File "/content/drive/.shortcut-targets-by-id/1Y0y4gKBXV3n7Shh3znhZPs6ajOz1g_g3/Semester 4/MAJOR_PROJECT/scitldr/SciTLDR-Data/multiprocessing_bpe_encoder.py", line 63, in for input in args.inputs FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/val.target' 2021-03-28 10:36:44 | INFO | fairseq_cli.preprocess | Namespace(align_suffix=None, alignfile=None, all_gather_list_size=16384, azureml_logging=False, bf16=False, bpe=None, cpu=False, criterion='cross_entropy', dataset_impl='mmap', destdir='/ctrl-bin/', empty_cache_freq=0, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, joined_dictionary=False, log_format=None, log_interval=100, lr_scheduler='fixed', memory_efficient_bf16=False, memory_efficient_fp16=False, min_loss_scale=0.0001, model_parallel_size=1, no_progress_bar=False, nwordssrc=-1, nwordstgt=-1, only_source=False, optimizer=None, padding_factor=8, plasma_path='/tmp/plasma', profile=False, quantization_config_path=None, reset_logging=False, scoring='bleu', seed=1, simul_type=None, source_lang='source', srcdict='dict.txt', suppress_crashes=False, target_lang='target', task='translation', tensorboard_logdir=None, testpref=None, tgtdict='dict.txt', threshold_loss_scale=None, thresholdsrc=0, thresholdtgt=0, tokenizer=None, tpu=False, trainpref='/ctrl/train.bpe', use_plasma_view=False, user_dir=None, validpref='/ctrl/val.bpe', wandb_project=None, workers=60) 2021-03-28 10:36:45 | INFO | fairseq_cli.preprocess | [source] Dictionary: 50264 types Traceback (most recent call last): File "/usr/local/bin/fairseq-preprocess", line 33, in sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-preprocess')()) File "/content/fairseq/fairseq_cli/preprocess.py", line 394, in cli_main main(args) File "/content/fairseq/fairseq_cli/preprocess.py", line 284, in main make_all(args.source_lang, src_dict) File "/content/fairseq/fairseq_cli/preprocess.py", line 252, in make_all make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers) File "/content/fairseq/fairseq_cli/preprocess.py", line 248, in make_dataset make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers) File "/content/fairseq/fairseq_cli/preprocess.py", line 133, in make_binary_dataset offsets = Binarizer.find_offsets(input_file, num_workers) File "/content/fairseq/fairseq/binarizer.py", line 106, in find_offsets with open(PathManager.get_local_path(filename), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/ctrl/train.bpe.source'`

hey have you succeeded replicating this work , I am also struck at running this step. please help