Dhanachandra / bert_crf

BERT CRF model for Name Entity Recognition in pytorch
MIT License
28 stars 15 forks source link

add bilstm layer #2

Closed rajae-Bens closed 3 years ago

rajae-Bens commented 3 years ago

Hi,

It s me again, I opened a new issue because I tried to add a BILSTM layer to see if it ganna enhance the performance or not Actually the enhancement was not impressive ~2% between two models

I wanted to share with u how I did to add the BILSTM layer and see if I did it correctly or not because I am a newbie in deep learning and I am trying to make my first model for a project study

  class Bert_Bilstm_CRF(BertPreTrainedModel):
def __init__(self, config):
    super(Bert_CRF, self).__init__(config)
    self.num_labels = config.num_labels
    self.bert = BertModel(config)
    self.rnn = nn.LSTM(bidirectional=True, num_layers=2, input_size=768, hidden_size=768//2, batch_first=True)  
    self.dropout = nn.Dropout(config.hidden_dropout_prob)
    self.classifier = nn.Linear(config.hidden_size, self.num_labels)
    self.init_weights()
    self.crf = CRF(self.num_labels, batch_first=True)    

def forward(self, input_ids, attn_masks, labels=None):  # dont confuse this with _forward_alg above.
    outputs = self.bert(input_ids, attn_masks)
    sequence_output = outputs[0]
    enc, _ = self.rnn(sequence_output)
    sequence_output = self.dropout(enc)
    emission = self.classifier(sequence_output)        
    attn_masks = attn_masks.type(torch.uint8)
    if labels is not None:
        loss = -self.crf(log_soft(emission, 2), labels, mask=attn_masks, reduction='mean')
        return loss
    else:
        prediction = self.crf.decode(emission, mask=attn_masks)
        return prediction
Dhanachandra commented 3 years ago

Yes, you defined the model correctly.

rajae-Bens commented 3 years ago

ok thanks for answering