Use of bias in value layer

Stavr0sStuff commented 3 years ago

Hello,

very interesting paper, and nice to publish parts of the code along with it!

A couple of questions:

I was wondering why the layer calculating the values self.v_conv has a bias attached to it. Looking at other 'attention' implementations, it seems that those mostly exclude bias from it (as you also do for the keys and queries). Did you see any improvement adding a bias there?
Is there any reason for setting the initial weights of the key and query layer equally?
In the paper, you mention making use of Farthest Point Sampling (FPS) for the neighbor embedding module, but before you sample you embed the pure 3-dimensional point-coordinates in a high-dimensional space. Do you perform FPS in the full 64-dimensional space, or do you do this in the 3-dimensional one?

Kind regards, steven

MenghaoGuo commented 3 years ago

Hello, thanks for your attention. For the questions:

In experiments, we do not observe a significant improvement by setting bias=True and self.q_conv.bias = self.k_conv.bias.
Using this initialization can ensure a reasonable attention map at the beginning of training, which can improve the stability of training processing.
We use Farthest Point Sampling (FPS) in the 3-dimensional European space. Moreover,, it may get better performance by using FPS in high-dimensional space.

Best Regards, Meng-Hao

ds-steventondeur commented 3 years ago

Thanks for your answers.

Regarding question 1: my apologies I wrote the question in a confusing way. I was not so much wondering weather a bias should be added for keys and queries, but rather the bias could be removed from the dense later calculating the values (unattended, i.e. before multiplied with the attention based weights). Most implementations of the original 'attention is all you need' paper seem to not use the bias in the value calculation.

MenghaoGuo commented 3 years ago

Yes, you can remove the bias in the value calculation. In experiments, it seems not necessary in the point cloud transformer.

MenghaoGuo / PCT

Use of bias in value layer #4