fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.14k stars 513 forks source link

Confusing in the notation x_k and z_q #214

Open wennycooper opened 12 months ago

wennycooper commented 12 months ago

Hi, I'm trying to compare the notion of your paper and the article (let's called it doc2) http://jalammar.github.io/illustrated-transformer/

Would you plz clearify:

  1. What is x_k? In doc2, X is inputs, the key elements are X W^k, so.. in your paper, the x_k is the k-th element of X W^k?

  2. What is z_q In doc2, X is inputs, the query elements are X W^q, so.. in your paper, the z_q is the q-th element of X W^q? or the z_q is the q-th element of the "object query" (which is the input of the decoder) ??