Closed Continue7777 closed 5 years ago
Recently,i do some experient about bert and transformer on text_classification.I find position always consists of two linear transformations with a ReLU activation in between.But you use conv?Do you have something special thought about this change.
https://github.com/brightmart/text_classification/blob/3e7911b57651b16eda12f2508007143f376b7a99/a07_Transformer/a2_poistion_wise_feed_forward.py#L35-L58
hi, I do saw above setting from BERT. I use conv follow transfomer's implementation( tensor2tensor). we think it may have less parameters.
Recently,i do some experient about bert and transformer on text_classification.I find position always consists of two linear transformations with a ReLU activation in between.But you use conv?Do you have something special thought about this change.
https://github.com/brightmart/text_classification/blob/3e7911b57651b16eda12f2508007143f376b7a99/a07_Transformer/a2_poistion_wise_feed_forward.py#L35-L58