Closed dogyoonlee closed 3 years ago
Hi, thank you for your attention to PCT. In experiment, we find it can make the entire network converge better. Moreover, we observe that the weights of q_conv kernel and k_conv kernel are different after training.
Hello,
I really appreciate to your creative work.
However, I hope to know why did you use same weight value when initialize the q_conv and k_conv kernel?
As I know, they don't have to be identical.
Is there any reason?
Thank you for your work again.