Closed ShabnamRA closed 7 months ago
In this modified version provided here, split() is called without specifying any separator, which defaults to splitting based on whitespace characters such as space, tab, or newline. This resolved the ValueError caused by the empty separator.You need to modify this tutorial as follows :
class NeMoGPTv2(NeMoGPT):
def setup_training_data(self, train_data_config: OmegaConf):
self.vocab = None
self._train_dl = self._setup_data_loader(train_data_config)
# Save the vocab into a text file for now
with open('vocab.txt', 'w') as f:
for token in self.vocab:
f.write(f"{token}")
# This is going to register the file into .nemo!
# When you later use .save_to(), it will copy this file into the tar file.
self.register_artifact('vocab_file', 'vocab.txt')
def setup_validation_data(self, val_data_config: OmegaConf):
vocab_file = self.register_artifact('vocab_file', 'vocab.txt')
with open(vocab_file, 'r') as f:
vocab = f.read().split()[:-1] # Split based on whitespace characters
self.vocab = vocab
self._validation_dl = self._setup_data_loader(val_data_config)
def setup_test_data(self, test_data_config: OmegaConf):
# This is going to try to find the same file, and if it fails,
# it will use the copy in .nemo
vocab_file = self.register_artifact('vocab_file', 'vocab.txt')
with open(vocab_file, 'r') as f:
vocab = []
vocab = f.read().split()[:-1] # the -1 here is for the dangling token in the file
self.vocab = vocab
self._test_dl = self._setup_data_loader(test_data_config)
I am trying to learn NeMo from "tutorials/01_NeMo_Models.ipynb"
at the end of the page after crating NeMoGPTv2 class try to create a model :
model = NeMoGPTv2(cfg=cfg.model)
facing the following error :