A user reported a crash with 24.01.01 and SFT (while things work fine with 24.01):
File "/opt/NeMo-Aligner/examples/nlp/gpt/train_gpt_sft.py", line 215, in main
init_using_ptl(trainer, ptl_model, train_dataloader, train_ds)
File "/opt/NeMo-Aligner/nemo_aligner/utils/train_script_utils.py", line 103, in init_using_ptl
call._call_setup_hook(ptl_trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 86, in _call_setup_hook
_call_lightning_module_hook(trainer, "setup", stage=fn)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py", line 145, in _call_lightning_module_hook
output = fn(*args, **kwargs)
File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_gpt_model.py", line 1372, in setup
self._reconfigure_val_batches()
File "/opt/NeMo/nemo/collections/nlp/models/language_modeling/megatron_base_model.py", line 340, in _reconfigure_val_batches
val_len_in_micro_batches = len(self._validation_dl)
TypeError: object of type 'NoneType' has no len()
Describe the bug
A user reported a crash with 24.01.01 and SFT (while things work fine with 24.01):