UKPLab / emnlp2017-bilstm-cnn-crf

BiLSTM-CNN-CRF architecture for sequence tagging
Apache License 2.0
825 stars 263 forks source link

Multi-task settings #48

Open Mahmedturk opened 5 years ago

Mahmedturk commented 5 years ago

Hi @nreimers

For the multi-task framework, does it always have to be pos and chunking? or it could be any sequence labelling tasks?

nreimers commented 5 years ago

Hi. You can use it of course for any sequence tagging task. The POS and chunking code is just an example, which you can modify to be used for your use case.

Mahmedturk commented 5 years ago

I am really confused about the architecture of multi task framework as there is no diagram in the original paper. Could you please explain as to which layers are being shared? In the example that you have given pos is at the lower level because it appears first in the constructor? Each task has its own Bi-LSTM-CRF task specific network, which of the layers are being shared? Can you show this graphically?

nreimers commented 5 years ago

In the Train_MultiTask.py example, the POS and chunking network both share the embedding layer and one LSTM layer. If you change params like this

params = {'classifier': ['CRF'], 'LSTM-Size': [100, 50], 'dropout': (0.25, 0.25)}

Both networks would share 2 stacked LSTM layers, the first with 100 recurrent units, the second with 50.

In that file, only the CRF is task specific.

If you run the code, the model architecture is also printed. Shared layers have the name shared..., while task specific layers have the name POS... and chunking_....

The order in the datasets dict doesn't matter.

In Train_MultiTask_Different_Levels.py, the POS layer uses the output from the first LSTM layer, while chunking has a task specific LSTM layer:

params = {'classifier': ['CRF'], 'LSTM-Size': [100], 'dropout': (0.25, 0.25),
          'customClassifier': {'unidep_pos': ['Softmax'], 'conll2000_chunking': [('LSTM', 50), 'CRF']}}

Both networks have one shared LSTM (LSTM-Size), then pos uses 'Softmax' on top of that shared LSTM layer. Chunking in contrast uses an LSTM with 50 units and then a CRF.

Mahmedturk commented 5 years ago

OK thanks for the detailed answer. Could also explain the difference between "Train_Multitask" and Train_Multitask_different_levels"?

nreimers commented 5 years ago

Have a look at this paper: https://www.aclweb.org/anthology/P16-2038

Train_Multitask_different_levels implements the ideas from that paper.

Mahmedturk commented 5 years ago

Thanks for the link.