For the embedding of each masked point patch, we replace it with a share-weighted learnable mask token
Could you provide some intuition behind why using a shared token for all masked tokens works? I had initially assumed that every patch would require a unique learnable masked token, but this of course, as proven by your work, is not the case.
Hi!
In your paper, you mention that:
Could you provide some intuition behind why using a shared token for all masked tokens works? I had initially assumed that every patch would require a unique learnable masked token, but this of course, as proven by your work, is not the case.
Thanks!