creating a NeMo model - Githubissues

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Apache License 2.0

11.79k stars 2.45k forks source link

I am trying to learn NeMo from "tutorials/01_NeMo_Models.ipynb"

at the end of the page after crating NeMoGPTv2 class try to create a model : model = NeMoGPTv2(cfg=cfg.model)

facing the following error :

   File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/IPython/core/interactiveshell.py", line 3577, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-67-1b7caab869c2>", line 1, in <module>
    model = NeMoGPTv2(cfg=cfg.model)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "<ipython-input-31-f04b7157a9ba>", line 3, in __init__
    super().__init__(cfg=cfg, trainer=trainer)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/core/classes/modelPT.py", line 154, in __init__
    self.setup_multiple_validation_data(val_data_config=cfg.validation_ds)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/core/classes/modelPT.py", line 539, in setup_multiple_validation_data
    model_utils.resolve_validation_dataloaders(model=self)
  File "/home/shabs/anaconda3/envs/NeMo/lib/python3.11/site-packages/nemo/utils/model_utils.py", line 293, in resolve_validation_dataloaders
    model.setup_validation_data(cfg.validation_ds)
  File "<ipython-input-66-0c8f18429ac6>", line 23, in setup_validation_data
    vocab = f.read().split('')[:-1]  # the -1 here is for the dangling  token in the file
            ^^^^^^^^^^^^^^^^^^
ValueError: empty separator

class NeMoGPTv2(NeMoGPT): def setup_training_data(self, train_data_config: OmegaConf): self.vocab = None self._train_dl = self._setup_data_loader(train_data_config) # Save the vocab into a text file for now with open('vocab.txt', 'w') as f: for token in self.vocab: f.write(f"{token}") # This is going to register the file into .nemo! # When you later use .save_to(), it will copy this file into the tar file. self.register_artifact('vocab_file', 'vocab.txt') def setup_validation_data(self, val_data_config: OmegaConf): vocab_file = self.register_artifact('vocab_file', 'vocab.txt') with open(vocab_file, 'r') as f: vocab = f.read().split()[:-1] # Split based on whitespace characters self.vocab = vocab self._validation_dl = self._setup_data_loader(val_data_config) def setup_test_data(self, test_data_config: OmegaConf): # This is going to try to find the same file, and if it fails, # it will use the copy in .nemo vocab_file = self.register_artifact('vocab_file', 'vocab.txt') with open(vocab_file, 'r') as f: vocab = [] vocab = f.read().split()[:-1] # the -1 here is for the dangling token in the file self.vocab = vocab self._test_dl = self._setup_data_loader(test_data_config)

NVIDIA / NeMo

creating a NeMo model #8601