Closed okuchaiev closed 10 months ago
Fix for this is #7943
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Traceback (most recent call last):
File "/opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 46, in <module>
main()
File "/opt/NeMo/nemo/core/config/hydra_runner.py", line 129, in wrapper
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in <lambda>
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/opt/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 40, in main
model = MegatronGPTModel(cfg.model, trainer)
File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 273, in __init__
super().__init__(cfg, trainer=trainer, no_lm_init=True)
File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 221, in __init__
self._build_tokenizer()
File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 421, in _build_tokenizer
self.tokenizer = get_nmt_tokenizer(
File "/opt/NeMo/nemo/collections/nlp/modules/common/tokenizer_utils.py", line 175, in get_nmt_tokenizer
raise ValueError("No Tokenizer path provided or file does not exist!")
ValueError: No Tokenizer path provided or file does not exist!
Getting this error when trying to run nemotron. Has this issue been resolved? How can I ensure the tokenizer is loaded?
Describe the bug A lot of unit tests in NLP collection (over 10) require correct version of /home/TestData folder (from internal CI machines) to be present to run successfully.
This makes it impossible to run unittests successfully anywhere but on internal NVIDIA CI machines.
To Reproduce Clone NeMo on new machine in clean environment and try running pytest tests/collections/nlp Make sure you do not have /home/TestData folder on the machine.
Expected behavior Unittest run by pytest command should run successfully, not only on CI machines. E.g. external developer/contributor should be able to run unit tests.
Stack trace/logs
Environment (please complete the following information):
PyTorch version 2.* CUDA version NCCL version
Proposed fix
I proposed tests to either skip if /home/TestData folder isn't found, or to be re-written.
Additional context Add any other context about the problem here.