Closed fanbooo closed 1 year ago
谢谢作者的工作;请问下能否提供下pretrain的脚本? 另外求问这个loss这段能否大概解释一下,论文里似乎也没有特别详细的介绍:
_, loss_mask, position_ids = get_ltor_masks_and_position_ids_from_embeddings(input_embeds) # Calculate the loss_mask non_padding_mask = non_padding_mask.long() non_media_mask = non_media_mask.long() prompt_mask = prompt_mask.long() # TODO How to deal with prompt mask # from icecream import ic # non_padding_mask = non_padding_mask[:,:-1] # non_media_mask = non_media_mask[:,:-1] # prompt_mask = prompt_mask[:,:-1] # attention_mask = attention_mask[:,:-1] loss_mask = loss_mask[:, :-1] loss_mask = loss_mask * non_padding_mask * non_media_mask * prompt_mask labels[:, 1:][loss_mask != 1] = -100 # Forward into GPT outputs = self.language_model( inputs_embeds=input_embeds, attention_mask=attention_mask, labels=labels, return_dict=return_dict, output_attentions=self.config.output_attentions, ) # outputs.loss = (outputs.loss * loss_mask.view(-1) # ).sum()/loss_mask.sum()
We only apply LM loss across both stage 1 and stage 2. It just calculate the loss on the response part.
谢谢作者的工作;请问下能否提供下pretrain的脚本? 另外求问这个loss这段能否大概解释一下,论文里似乎也没有特别详细的介绍: