Open fancyerii opened 6 years ago
Agreed. I think the size of output should be 2 num_labels + 1 = 2 42 + 1 = 85(in this case), which means that the arc-label should be one parameter to be predicted.
Only after we have predicted one word's all dependents can we use one word's children as inputs to predict the word's head and arc-label.
when parsing dependency without label(such as nsub,nmod), We can't get the label of s0's left most child or the label of s0's child's child. So we can't use it as input. In extract_for_current_state, it will get the left most child of s0 and get the label of this child. But for parsing without label, this information is not available.