Open osession opened 1 year ago
Hi, actually, this 'graph_prediction_with_flag' is a custom registered task in the fairseq
framework, located here:
This custom task normally should be imported in the runtime to register the task. The default parameter in the evaluate.sh
script defines the location of 'graph_prediction_with_flag' task:
So the problem for you is the evaluate.py
code can't find this custom task. Maybe you are not running ./run_custom_input.sh
command in the Dynaformer (project root) directory, which makes the relative path not valid. Or maybe you're running evaluate.py
with fewer parameters.
Please follow the steps in README.md
, if this problem still exist, please post detailed steps here, and I will happy to see what happened :D
I am running ./run_evaluate.sh in the home directory. I don't think I am running evaluate.py with fewer parameters since I have not modified any of the evaluate.sh file. It seems like instead of looking in this file path that you showed (https://github.com/Minys233/dynaformer_model/blob/c9942c389e545a5f43f0834031ce36034cb9b343/examples/evaluate/evaluate.sh#L27), it is maybe instead looking here? https://github.com/facebookresearch/fairseq/tree/98ebe4f1ada75d006717d84f9d603519d8ff5579/fairseq/tasks
At least those are all the other names of the tasks that are being listed in the error that I'm still getting.
I think I figured out the issue. I was getting this error: Dynaformer/examples/evaluate/evaluate.sh: line 25: realpath: command not found. So when I removed the realpath command and just replaced those lines with simply the string of the filepath, it was able to find the graph_prediction.py script. Thank you for your help!!
I think I figured out the issue. I was getting this error: Dynaformer/examples/evaluate/evaluate.sh: line 25: realpath: command not found. So when I removed the realpath command and just replaced those lines with simply the string of the filepath, it was able to find the graph_prediction.py script. Thank you for your help!!
Glad to hear this and thank you for pointing out this! After some googling and some testing, I find that realpath
command is a part of coreutils
, but in newer versions, this command is deprecated. More reliable readlink
command should be used instead for the same purpose. I will soon update README.md
and corresponding scripts.
Hello again, I've been trying to run bash Dynaformer/examples/md_pretrain/md_train.sh
, but I am running into a similar issue that I had before with getting the 'invalid choice: graph_prediction_with_flag' error. It still isn't working this time even after adjusting the realpath command. Sorry to bring this up again!
fairseq-train: error: argument --task: invalid choice: 'graph_prediction_with_flag' (choose from 'translation', 'translation_from_pretrained_xlm', 'denoising', 'multilingual_denoising', 'speech_to_text', 'text_to_speech', 'hubert_pretraining', 'online_backtranslation', 'sentence_prediction', 'speech_to_speech', 'simul_speech_to_text', 'simul_text_to_text', 'audio_pretraining', 'audio_finetuning', 'cross_lingual_lm', 'frm_text_to_speech', 'multilingual_translation', 'translation_from_pretrained_bart', 'semisupervised_translation', 'multilingual_masked_lm', 'translation_multi_simple_epoch', 'language_modeling', 'multilingual_language_modeling', 'translation_lev', 'masked_lm', 'sentence_ranking', 'legacy_masked_lm', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt')
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 96930) of binary: /home/ray/anaconda3/bin/python
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ray/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
************************************************
/home/ray/anaconda3/bin/fairseq-train FAILED
================================================
Root Cause:
[0]:
time: 2023-07-11_09:24:00
rank: 0 (local_rank: 0)
exitcode: 2 (pid: 96930)
error_file: <N/A>
msg: "Process failed with exitcode 2"
================================================
Other Failures:
<NO_OTHER_FAILURES>
************************************************
I figured out that the user directory was incorrect which was why it was unable to find the 'graph_prediction_with_flag' custom task. So I changed line 157 in md_train.sh from --user-dir "$(realpath ./dynaformer)" \
to --user-dir "$(realpath ./Dynaformer/dynaformer)" \
.
However, the training is still stopping at this error:
Root at /home/ray/dataset
Loading hybrid data from md-refined2019-5-5-5, general-set-2019-coreset-2016
Downloading https://scientificdata.blob.core.windows.net/dynaformer/dataset/mddata/md-refined2019-5-5-5.zip
Extracting /home/ray/dataset/md-refined2019-5-5-5.zip
Processing...
Loading file: /home/ray/dataset/md-refined2019-5-5-5_train_val.pkl, exists? True
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 54483) of binary: /home/ray/anaconda3/bin/python
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ray/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ray/anaconda3/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
*************************************************
/home/ray/anaconda3/bin/fairseq-train FAILED
=================================================
Root Cause:
[0]:
time: 2023-07-13_09:03:04
rank: 0 (local_rank: 0)
exitcode: -9 (pid: 54483)
error_file: <N/A>
msg: "Signal 9 (SIGKILL) received by PID 54483"
=================================================
Other Failures:
<NO_OTHER_FAILURES>
*************************************************
I figured out the solution to the above error was to switch my head node to a type that had 122 GB instead of 30 GB of storage, and it seems to be working now.
Hi, I have been trying to run the run_evaluation.sh with the provided checkpoints downloaded and unzipped to the checkpoints directory. I am running into this error:
evaluate.py: error: argument --task: invalid choice: 'graph_prediction_with_flag' (choose from 'hubert_pretraining', 'denoising', 'multilingual_denoising', 'translation', 'multilingual_translation', 'translation_from_pretrained_bart', 'translation_lev', 'language_modeling', 'speech_to_text', 'legacy_masked_lm', 'text_to_speech', 'speech_to_speech', 'online_backtranslation', 'simul_speech_to_text', 'simul_text_to_text', 'audio_pretraining', 'semisupervised_translation', 'frm_text_to_speech', 'sentence_prediction', 'cross_lingual_lm', 'translation_from_pretrained_xlm', 'multilingual_language_modeling', 'audio_finetuning', 'masked_lm', 'sentence_ranking', 'translation_multi_simple_epoch', 'multilingual_masked_lm', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt')
I can't find the graph_prediction_with_flag.py script anywhere else and was curious if it has just been removed permanently or if there is another way to run predictions?
Thanks!