Some confusions about the Char Emb Layer in class BiDAF

galsang / BiDAF-pytorch

Re-implementation of BiDAF(Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo et al., ICLR 2017) on PyTorch.

244 stars 85 forks source link

Some confusions about the Char Emb Layer in class BiDAF #21

Closed Tackoil closed 4 years ago

Tackoil commented 4 years ago

Hi. Thank you for sharing your code. It make a great help and I learnt a lot from it.

I'm a little confused about the padding_idx = 1 in self.char_emb. In my opinion, this para means that when the EmbeddingLayer receive a 1, it will output a zero-vector. Is there any special effect for the character corresponding to the index 1?

The other one is about the CharCNN. I glanced the offical code which is implement by tf. I found there was a ReLU after the cnn. I tried to apply the ReLU in my code but the result decrease by 5%. So why you didn't use it. Is this just a negligence or another reason？

galsang commented 4 years ago

It’s actually been while since I wrote the code, so let me respond to your questions as far as I remember.

As far as I know, a vocabulary made by the torchtext library basically starts with [UNK] and [PAD] special tokens whose indices are 0 and 1, respectively. That’s why I put the assignment phrase ‘padding_idx=1’, emphasizing that the index 1 indicates the padding token [PAD].
That seems weird for me, too, because I’m familiar with the fact that most convolution layers are followed by some forms of non-linear activations such as ReLU (as you mentioned). It looks like just a mistake that I omitted the ReLU function. But, I don’t understand why adding the non-linear activation results in a much worse outcome. If you find that it actually helps to augment the convolutional layer with a non-linear function in your further experiments, please alarm me with a pull request to revise the code.

Tackoil commented 4 years ago

Thank you for your reply. I run the code in debug and the vocabulary is just as you say. And I will try to apply the ReLU and other non-linear function and pr if it help. It takes some time because the lab's public computer has an older python version.

Tackoil commented 4 years ago

Hi. Due to my negligence, the local repository was not updated in time. Previous results didn't include the commit 165730f9266fe29386f31df7e86ba25bee6b90da. With that change, the result is follow.

Experiment	EM	F1
Without ReLU	64.8	75.3
With ReLU	64.3	75.2
With ReLU	64.9	75.4
With ReLU	64.1	75.0

Previous comment:

Hi. I finished the experiment and the result is below.

Experiment	EM	F1
Without ReLU	64.8	75.4
With ReLU	64.4	75.3
With ReLU	63.9	74.7
With ReLU	64.2	75.3

I think the change is not so obvious. I‘ll pr the change and you can decide whether to merge according to your needs. Thank you.

galsang commented 4 years ago

Merged your pull request. Thanks.