Open abeyang00 opened 3 years ago
Hi~ The proposal features contains information about its corresponding object. The proposal feature updates itself by interacting with RoI feature. We don't need feature information for each pixel position.
so roi feature can be regarded as Query and proposal features as Key?
I guess Query is proposal features [100 x C], roi feature is Key [100 x (7 x 7 x C)] . Think about DETR, Query is object query [100 x C], Key is 100 times reshaped image feature map [100 x (HW x C)], where each (HW x C) is the same.
@PeizeSun Isn't that Q and K must have the same hidden dimension to process matrix multiplication, like in DETR Q is [100 x C] and K is [HW x C] instead of [100 x (HWC)]?
I have a question regarding proposal feature.
In DETR paper, reshaped feature map (HW x C) is given as input to transformer encoder to learn correlation between each pixels. However, in your paper, you use C size vector (named 'prop_feats') instead of reshaped feature map.
How does this C size vector learn the correlation among each pixels? In my understanding this does not contain the feature information for each pixel position.
I saw your reply in one of the previous issues where you replied 'don't understand dynamic head as Q,K,V'. How should i understand this concept then??
Thank you in advance!