ai-forever / ru-gpts

Russian GPT3 models.
Apache License 2.0
2.08k stars 445 forks source link

KeyError: 'loss' во время обучения моделей Small и medium #87

Closed andreinechaev closed 2 years ago

andreinechaev commented 2 years ago

Будучи новичком с 🤗 transformers. пытаюсь использовать ваши модели как предобученных. К сожалению постоянно выходит ошибка KeyError: 'loss'

tokenizer = AutoTokenizer.from_pretrained("sberbank-ai/rugpt3small_based_on_gpt2")
model = AutoModel.from_pretrained("sberbank-ai/rugpt3small_based_on_gpt2").cuda()

def tokenize_function(examples, max_len=256):
    return tokenizer(examples['История болезни'], truncation=True, padding="max_length", max_length=max_len)

studmed_ds_train = datasets.load_dataset('csv', data_files='data/studmed_data.csv', split='train[:80%]')
studmed_ds_test = datasets.load_dataset('csv', data_files='data/studmed_data.csv', split='train[80%:]')

tokenizer.pad_token = tokenizer.eos_token

tokenized_datasets_train = studmed_ds_train.map(tokenize_function, batched=True)
tokenized_datasets_test = studmed_ds_test.map(tokenize_function, batched=True)

training_args = TrainingArguments("test_trainer")

model.config.n_embd = 256
# model.config.
trainer = Trainer(model=model, args=training_args, train_dataset=tokenized_datasets_train, eval_dataset=tokenized_datasets_test)

trainer.train()

что ведет к ошибке

/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in __getitem__(self, k)
   2595         if isinstance(k, str):
   2596             inner_dict = {k: v for (k, v) in self.items()}
-> 2597             return inner_dict[k]
   2598         else:
   2599             return self.to_tuple()[k]

KeyError: 'loss'

полный пример в google colab

andreinechaev commented 2 years ago

okay, understood the difference between tokenizers. sorry for the noise.