about transformer position_wise_feed_forward

brightmart / text_classification

all kinds of text classification models and more with deep learning

MIT License

7.83k stars 2.57k forks source link

about transformer position_wise_feed_forward #98

Closed Continue7777 closed 5 years ago

Continue7777 commented 5 years ago

Recently,i do some experient about bert and transformer on text_classification.I find position always consists of two linear transformations with a ReLU activation in between.But you use conv?Do you have something special thought about this change.

https://github.com/brightmart/text_classification/blob/3e7911b57651b16eda12f2508007143f376b7a99/a07_Transformer/a2_poistion_wise_feed_forward.py#L35-L58

brightmart commented 5 years ago

hi, I do saw above setting from BERT. I use conv follow transfomer's implementation( tensor2tensor). we think it may have less parameters.