NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
10.99k stars 2.29k forks source link

Unable to disable validation #9385

Closed liyier90 closed 1 week ago

liyier90 commented 1 month ago

Describe the bug

Setting model.data.splits_string to 1000,0,0 results in TypeError: object of type 'NoneType' has no len()

Steps/Code to reproduce bug

  1. Follow the steps in the GPT model training tutorial
  2. Change model.data.splits_string=\'1000,0,0\'
  3. Run

Expected behavior

I should be able to perform pretraining without having to:

  1. Provide validation dataset
  2. Provide test dataset
  3. Run validation

Environment overview (please complete the following information)

N/A. I was able to run the tutorial when splits_string=\'980,10,10\'.

Additional context

Traceback:

Traceback (most recent call last):
  File "/data/projects/11003281/multi-node/cache/source_files/nemo/NeMo/examples/nlp/language_modeling/megatron_gpt_pretraining.py", line 42, in main
    trainer.fit(model)
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
    return function(*args, **kwargs)
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 949, in _run
    call._call_setup_hook(self)  # allow user to set up LightningModule in accelerator environment
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 94, in _call_setup_hook
    _call_lightning_module_hook(trainer, "setup", stage=fn)
  File "/data/projects/11003281/multi-node/envs/nemo/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/data/projects/11003281/multi-node/cache/source_files/nemo/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1619, in setup
    self.setup_validation_data(self.cfg.data)
  File "/data/projects/11003281/multi-node/cache/source_files/nemo/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1645, in setup_validation_data
    f'Setting up validation dataloader with len(len(self._validation_ds)): {len(self._validation_ds)} and consumed samples: {consumed_samples}'
TypeError: object of type 'NoneType' has no len()
github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 week ago

This issue was closed because it has been inactive for 7 days since being marked as stale.