chilynn / sequence-labeling

307 stars 167 forks source link

关于模型中CRF层的几个小疑问 #28

Open Ethan1214 opened 6 years ago

Ethan1214 commented 6 years ago

你好: 对于实现过程中部分代码有一点疑问,希望您可以为我解答: 在实现CRF的batch 操作的时候, 是否把一个batch中的句子拼成了一个句子看待?

        因为我看到在计算point_score时有这样的操作:
        self.point_score = tf.gather(tf.reshape(self.tags_scores, [-1]), tf.range(0, self.batch_size * self.num_steps) * self.num_classes + tf.reshape(self.targets,[self.batch_size * self.num_steps]))
         请问我理解的对吗?

         还有我改写了部分代码,现在能在多类别标签数据集上跑通,但是loss会出现负数, 请问是和crf层中的常量设置有关系吗?
         但是按照 公式推导,p(y|X)是个小数,套一个log为负,再取反作为loss应该是正数才对。
tomsonsgs commented 6 years ago

i solve the problem by change the code if condition "i==0" by "j==0" in getTransition(y_train_batch) function,and loss come to 0 as expected,you all can check on that,and will this improve the final accuracy?someone can try that

Ethan1214 commented 6 years ago

Hi,@tomsonsgs : I have tried your method,and it works and make loss above 0, but the accuracy didn't improve obviously. I don't understand why did your make it break when j==0. In my opinion,we could ignore the transition-score that the last word of one sentence to the ending_tag("") if breaking when j==0. For example, we have a true label_sequence when training: B M E O O B E ........ If breaking when j==0, we ignore the transition-score of "E to ". Although my loss came under 0, I think the primary method is right .

Can you tell me why did you make it break when j==0??

tomsonsgs commented 6 years ago

因为用i==0会让目标路径多加了一个最后一个字母到填充符的转移量,而计算整体路径得分只计算到最后一个字母自身得分没有加之后的转移得分

tomsonsgs commented 6 years ago

你可以看下原始代码关于所有路径得分的计算过程就知道了,他没有加最后的转移得分

Ethan1214 commented 6 years ago

@tomsonsgs 我看他在前向计算总路径得分的时候,对transition的运用并没有看出哪边对最后的转义得分做了省略,能否说明一下具体是哪几步操作呢?

万分感谢!

last_alphas = tf.gather(alphas, tf.range(0, self.batch_size) * (self.num_steps + 2) + length) 将length改成length+1是否可行, 这样i==0应该就不用改了??!!

fxh0919 commented 6 years ago

想请问一下为什么代码中crf层的dummy_val设置成-1000?是有什么讲究么?

tomsonsgs commented 6 years ago

@fxh0919 一个极小值,表示某节点取到该类别得分极小,可以-2000等等,因为开始的话必然在起始状态,其他类别的可能性为0,但在log后一般取极小表示概率接近于0