Closed Tackoil closed 4 years ago
It’s actually been while since I wrote the code, so let me respond to your questions as far as I remember.
As far as I know, a vocabulary made by the torchtext library basically starts with [UNK] and [PAD] special tokens whose indices are 0 and 1, respectively. That’s why I put the assignment phrase ‘padding_idx=1’, emphasizing that the index 1 indicates the padding token [PAD].
That seems weird for me, too, because I’m familiar with the fact that most convolution layers are followed by some forms of non-linear activations such as ReLU (as you mentioned). It looks like just a mistake that I omitted the ReLU function. But, I don’t understand why adding the non-linear activation results in a much worse outcome. If you find that it actually helps to augment the convolutional layer with a non-linear function in your further experiments, please alarm me with a pull request to revise the code.
Thank you for your reply. I run the code in debug and the vocabulary is just as you say. And I will try to apply the ReLU and other non-linear function and pr if it help. It takes some time because the lab's public computer has an older python version.
Hi. Due to my negligence, the local repository was not updated in time. Previous results didn't include the commit 165730f9266fe29386f31df7e86ba25bee6b90da. With that change, the result is follow.
Experiment | EM | F1 |
---|---|---|
Without ReLU | 64.8 | 75.3 |
With ReLU | 64.3 | 75.2 |
With ReLU | 64.9 | 75.4 |
With ReLU | 64.1 | 75.0 |
Previous comment:
Hi. I finished the experiment and the result is below.
Experiment | EM | F1 |
---|---|---|
Without ReLU | 64.8 | 75.4 |
With ReLU | 64.4 | 75.3 |
With ReLU | 63.9 | 74.7 |
With ReLU | 64.2 | 75.3 |
I think the change is not so obvious. I‘ll pr the change and you can decide whether to merge according to your needs. Thank you.
Merged your pull request. Thanks.
Hi. Thank you for sharing your code. It make a great help and I learnt a lot from it.
I'm a little confused about the padding_idx = 1 in self.char_emb. In my opinion, this para means that when the EmbeddingLayer receive a 1, it will output a zero-vector. Is there any special effect for the character corresponding to the index 1?
The other one is about the CharCNN. I glanced the offical code which is implement by tf. I found there was a ReLU after the cnn. I tried to apply the ReLU in my code but the result decrease by 5%. So why you didn't use it. Is this just a negligence or another reason?