I have a question about the input and label loading in the Class-TrainDataset. When self.rag = False, the outputs from the __getitem__ method were as follows:
loss = None
if labels is not None:
# move labels to correct device to enable model parallelism
labels = labels.to(lm_logits.device)
# Shift so that tokens < n predict n
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss()
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
The loss is computed with shift_logits and shift_labels.
Therefore, as I understand it, the input_ids and label_ids should be as follows:
I love your work. Thank you for sharing the code.
I have a question about the input and label loading in the Class-TrainDataset. When
self.rag = False
, the outputs from the__getitem__
method were as follows:However, according to modeling_gpt2.py#L1330:
The loss is computed with
shift_logits
andshift_labels
.Therefore, as I understand it, the
input_ids
andlabel_ids
should be as follows:Note that the positions of '15332, 1302, 6476, ....' should be the same in both variables.
Could you please point out where I might have misunderstood?