ShannonAI / mrc-for-flat-nested-ner

Code for ACL 2020 paper `A Unified MRC Framework for Named Entity Recognition`
662 stars 118 forks source link

About the shape of BERT output #84

Open BillXuce opened 3 years ago

BillXuce commented 3 years ago

According to the Section 3.3.1 in the paper, the input of BERT consists of query and context whose length should be seq_len = n+m+2 and the output drops the representations of the query and special tokens. However, in bert_query_ner.py line 44 and 45, the shape of sequence_heatmap coming from BERT output is [batch_size, seq_len, hidden_size], which conflicts with the papers. So which method should be applied? How much difference between both(in performance)?

bert_outputs = self.bert(input_ids, token_type_ids=token_type_ids, attention_mask=attention_mask)

sequence_heatmap = bert_outputs[0]  # [batch, seq_len, hidden]
batch_size, seq_len, hid_size = sequence_heatmap.size()

start_logits = self.start_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]
end_logits = self.end_outputs(sequence_heatmap).squeeze(-1)  # [batch, seq_len, 1]
seanswyi commented 3 years ago

Unfortunately there are a few places in the repository where the code conflicts with the paper. I'm assuming that when the authors say they "dropped" the query portion, what they mean is that when the loss is calculated they applied the start/end label masks. I don't know for sure, but it's just my assumption.