Closed millenniumbismay closed 5 months ago
I am using meta-llama/Llama-2-7b-chat-hf
as the base model just if it could shed some light.
I got it... It is automatically added by transformers.DataCollatorForSeq2Seq()
which is ignored by PyTorch loss functions but creates a problem when trying to decode back. Can you shed some light on if we can convert the logits
in preprocess_logits_for_metrics
to labels
to text
using the following code -
logits = logits.softmax(dim=-1)
predicted_labels = torch.argmax(logits, dim=-1)
print("Predicted:", tokenizer.batch_decode(predicted_labels, skip_special_tokens=False, clean_up_tokenization_spaces=True))
-100 is the default mask ID in PyTorch’s CrossEntropyLoss that does not compute the loss. Therefore, if you fill in -100, the loss at that position will not be computed.
Somehow, I am seeing -100 being appended to the ground truth labels inside the preprocess_logits_for_metrics which can not be decoded back to string by tokenizer.batch_decode(). Just to make sure, train_on_inputs = True and hence, the following block of code doesn't run -
I tested it out by commenting this part and still the labels have -100. Could anyone explain please? I can remove them manually but I do not understand why they come in the first place.