Closed lmomoy closed 3 months ago
Hi, it is a good question; for details, please read our paper Sec 4.2 Paragraph 1: Re-weigh options of attention mechanism. In one word, vector attention is for unstructured data, and scalar attention is for structured data.
I notice that the PT v2 uses vector attention like v1, while PT v3 uses scalar attention. Why not continue with vector attention?