UKPLab / elmo-bilstm-cnn-crf

BiLSTM-CNN-CRF architecture for sequence tagging using ELMo representations.
Apache License 2.0
388 stars 81 forks source link

Performance on CoNLL-2003 #11

Open allanj opened 5 years ago

allanj commented 5 years ago

Can I know what's the performance you obtain with your new implementation

nreimers commented 5 years ago

Hi @allanj out-of-the-box, without any further tuning and with a rather simple model:

ELMo 5.5B embeddings: 91.81 +/- 0.19 ELMo 5.5B embeddings + GloVe word embeddings: 92.07 +/- 0.24 ELMo 5.5B embeddings + Komninos word embeddings: 92.13 +/- 0.17

allanj commented 5 years ago

Sorry for the late reply, but which layer of the hidden state you use? average or the final layer

nreimers commented 5 years ago

I recommend average

allanj commented 5 years ago

Thanks, I also found weighted average in neuralnets/ELMoWordEmbeddings.py, can I ask why it is just simply swaping the axes? If I'm not wrong, the 0 dimension is the layer and the first dimension is the position.

def applyElmoMode(self, elmo_vectors):
        if self.elmo_mode == 'average':
            return np.average(elmo_vectors, axis=0).astype(np.float32)
        elif self.elmo_mode == 'weighted_average':
            return np.swapaxes(elmo_vectors,0,1)
        elif self.elmo_mode == 'last':
            return elmo_vectors[-1, :, :]
        elif isinstance(self.elmo_mode, int):
            return elmo_vectors[int(self.elmo_mode), :, :]
        else:
            print("Unknown ELMo mode")
            assert (False)
nreimers commented 5 years ago

The weights are added and trained as part of the neural network. The ElmoEmbeddings class hence only returns the 3 layers. To be compatible for the input of the neural network, the axes must be swapped.