Open Ethan-Chen-plus opened 1 year ago
This code is just the standard way of calculating a model's loss over a given set of tokens.
Say you have a sentence of tokens (t1, ..., tn). Then if you feed this into the model (model(input_ids)
), the model will output n
prediction vectors where the i-th prediction vector contains the model's prediction for the i-th token, given the (i-1) first tokens.
Then, to calculate a loss, we feed in the same tokens as ground truth. So the model's prediction for the i-th token gets compared with the actual i-th token of the sentence.
In the paper, we just compare two models by looking at the perplexities they assign to the same sentence.
Thanks a lot! but I am wondering this:
def compute_loss(self, model, inputs, return_outputs=False):
return model(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
position_ids=inputs["position_ids"],
labels=inputs["labels"],
).loss
but in this repo, the code use input_ids labels, I thought that we might use inputs["labels"] as labels. @ftramer
Where is this code from? The code we use is a standard way of calculating perplexity in huggingface.
I know that the following can calculate loss However, why labels be input_id? After read the paper, maybe I think the code should be:
this can test and verify whether the output of the two models is the same. If different, maybe one of the model memorizes the train data.