Open Alobal opened 2 years ago
Hi,
PCT used the Batch normalization, instead of the Layer normalization, used by original Transformer.
I wonder how do you consider about Layer normalization and Batch normalization in PCT?
Hi,
PCT used the Batch normalization, instead of the Layer normalization, used by original Transformer.
I wonder how do you consider about Layer normalization and Batch normalization in PCT?