seq2seq的loss计算问题

920232796 / bert_seq2seq

pytorch实现 Bert 做seq2seq任务，使用unilm方案,现在也可以做自动摘要，文本分类，情感分析，NER，词性标注等任务,支持t5模型，支持GPT2进行文章续写。

Apache License 2.0

1.28k stars 208 forks source link

seq2seq的loss计算问题 #37

Open fengxin619 opened 3 years ago

fengxin619 commented 3 years ago

seq2seq_model.py 108行需要构建特殊的输出mask,屏蔽掉句子a的影响预测的值不用取最后sep符号的结果因此是到-1 predictions = predictions[:, :-1].contiguous() target_mask = token_type_id[:, 1:].contiguous()

为什么target_mask是丢掉了[CLS]位，而predictions是丢掉[SEP]位，这在计算Loss的时候不是错位了么？

920232796 commented 3 years ago

没有错位，你再仔细考虑考虑，predictions最后是sep，这个sep对应的输出是没有意义的。

fengxin619 commented 3 years ago

没有错位，你再仔细考虑考虑，predictions最后是sep，这个sep对应的输出是没有意义的。

哇，这么快回复。但是target_mask是丢掉了首位？是[CLS]对应的位置？

920232796 commented 3 years ago

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

fengxin619 commented 3 years ago

0, 0, 0, 0, 1, 1

厉害了大佬...脑筋急转弯我学会了！

fengxin619 commented 3 years ago

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

但是写诗_train.py 127行 target_ids_padded = token_ids_padded[:, 1:].contiguous() target_id为什么要从第一位开始取呢？

920232796 commented 3 years ago

你觉得应该怎么取呢？

fengxin619 commented 3 years ago

你觉得应该怎么取呢？ target_ids_padded = token_ids_padded[:, :-1].contiguous() 这样？.....求拍醒。

920232796 commented 3 years ago

不对这是目标怎么可能有第一个token？应该是从第二个token开始，一直到最后一个token。

fengxin619 commented 3 years ago

对，如果句子是 [cls, 1, 2, sep, 3, 4, sep] 那么prediction输出则是看[sep, 3, 4] 这几个 token的结果，因此屏蔽掉[cls, 1, 2]，就是利用了target_mask。这个句子对应的token_type_id=[0, 0, 0, 0, 1, 1, 1]，从第二位开始取，就是[0, 0, 0, 1, 1, 1]，prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果，你对应一下，不就刚好把前三个token 的输出屏蔽掉了吗。

像这个例子里面，目标应该是 [cls, 1, 2, sep, 3, 4, sep] ，然后prediction输出是[cls, 1, 2, sep, 3, 4]对应的结果。那目标不应该从第一位开始取么，然后丢掉最后一位

920232796 commented 3 years ago

你再想想。

fengxin619 commented 3 years ago

那我再想想..

920232796 / bert_seq2seq

seq2seq的loss计算问题 #37

seq2seq_model.py 108行 需要构建特殊的输出mask,屏蔽掉句子a的影响 预测的值不用取最后sep符号的结果 因此是到-1 predictions = predictions[:, :-1].contiguous() target_mask = token_type_id[:, 1:].contiguous()

seq2seq_model.py 108行需要构建特殊的输出mask,屏蔽掉句子a的影响预测的值不用取最后sep符号的结果因此是到-1 predictions = predictions[:, :-1].contiguous() target_mask = token_type_id[:, 1:].contiguous()