Open Mahmedturk opened 5 years ago
Hi. You can use it of course for any sequence tagging task. The POS and chunking code is just an example, which you can modify to be used for your use case.
I am really confused about the architecture of multi task framework as there is no diagram in the original paper. Could you please explain as to which layers are being shared? In the example that you have given pos is at the lower level because it appears first in the constructor? Each task has its own Bi-LSTM-CRF task specific network, which of the layers are being shared? Can you show this graphically?
In the Train_MultiTask.py example, the POS and chunking network both share the embedding layer and one LSTM layer. If you change params like this
params = {'classifier': ['CRF'], 'LSTM-Size': [100, 50], 'dropout': (0.25, 0.25)}
Both networks would share 2 stacked LSTM layers, the first with 100 recurrent units, the second with 50.
In that file, only the CRF is task specific.
If you run the code, the model architecture is also printed. Shared layers have the name shared..., while task specific layers have the name POS... and chunking_....
The order in the datasets dict doesn't matter.
In Train_MultiTask_Different_Levels.py, the POS layer uses the output from the first LSTM layer, while chunking has a task specific LSTM layer:
params = {'classifier': ['CRF'], 'LSTM-Size': [100], 'dropout': (0.25, 0.25),
'customClassifier': {'unidep_pos': ['Softmax'], 'conll2000_chunking': [('LSTM', 50), 'CRF']}}
Both networks have one shared LSTM (LSTM-Size), then pos uses 'Softmax' on top of that shared LSTM layer. Chunking in contrast uses an LSTM with 50 units and then a CRF.
OK thanks for the detailed answer. Could also explain the difference between "Train_Multitask" and Train_Multitask_different_levels"?
Have a look at this paper: https://www.aclweb.org/anthology/P16-2038
Train_Multitask_different_levels implements the ideas from that paper.
Thanks for the link.
Hi @nreimers
For the multi-task framework, does it always have to be pos and chunking? or it could be any sequence labelling tasks?