UKPLab / elmo-bilstm-cnn-crf

BiLSTM-CNN-CRF architecture for sequence tagging using ELMo representations.
Apache License 2.0
388 stars 81 forks source link

Multi-Task-Learning Architectures Clarification #28

Open FahadGhamdi opened 5 years ago

FahadGhamdi commented 5 years ago

Hi, I would like to ask about the MTL architectures. I'm trying to visualize the two MTL architectures provided in this repo. The first one is "Train_MultiTask." In this architecture, The POS and chunking tasks both share the embedding layer and one LSTM layer. The CRF layer is task-specific. So, I'm assuming that there are two loss functions, one for each task, right? The second architecture is "Train_MultiTask_Different_Levels." In this architecture, the POS layer uses the output from the first LSTM layer to predict the POS tags, while Chunking has a task-specific LSTM layer. Both networks have one shared LSTM. The POS uses 'Softmax/CRF' on top of that shared LSTM layer, while Chunking uses an LSTM with 50 units and then a CRF. I'm assuming that the order of the columns (POS tag column and chunking column) in the dataset is essential for this architecture. Does figure 2 (b and c) provided in this paper https://arxiv.org/pdf/1803.11326.pdf describe these two architectures?

Final question, now I'm training ELMO on my corpus. If I want to use the MTL architectures with ELMO, the only changes that need to be done are the following: 1- Copy lines 56 to line 76 from "Train_Chunking.py" 2- Paste the copied lines in "Train_MultiTask_Different_Levels.py" in line 56. Please let me know if further modifications need to be done.

nreimers commented 5 years ago

Hi @JokerCS757 yes, you description / the linked figures are correct.

Yes, updating the Train_MultiTask_Different_Levels.py to use ELMo embeddings should be quite straightforward with the steps you described.

I hope it works without any problems.

Best regards Nils Reimers

FahadGhamdi commented 5 years ago

Thanks for your reply. One more question, is it possible to enable the CNN option when using ELMo? if so, would you please briefly illustrate the input and the output for each layer? Thanks,

nreimers commented 5 years ago

Hi @JokerCS757 you can try it, the code is there, but I'm not sure if it works correctly. I never tested it. Just set the charEmbeddings config parameter to 'CNN' and see what happens.

In general, this architecture is a bit different than the architecture in bilstm-cnn-crf repository: Here, the ELMo embeddings and word embeddings are pre-computed and then the vectors are passed to the trainable Keras network. In the bilstm-cnn-crf code, the words are passed to the keras network and then are looked up to the matching word embeddings.

However, if you use ELMo embeddings, I think the CNN option is obsolete: ELMo uses a character based CNN to derive words embeddings at the first layer. This output is than passed through 2 stacked LSTM layers. In this code, all 3 layers from ELMo are used, i.e., you already use a character based CNN if you train with ELMo embeddings. Not sure if a second character based CNN would add much benefit.

Best regards Nils Reimers

FahadGhamdi commented 5 years ago

I will test it to see how it performs.
Thanks!!