JingyunLiang / RVRT

Recurrent Video Restoration Transformer with Guided Deformable Attention (NeurlPS2022, official repository)
https://arxiv.org/abs/2206.02146
Other
354 stars 33 forks source link

A question about candidates in GDA #1

Closed mrluin closed 2 years ago

mrluin commented 2 years ago

Hello,

Thanks for your great work!

I have a question about candidates of GDA. I think candidates=9 means one position in Q should calculate correlation with 9 positions in K and V like what illustrated in Figure 3. Does it means the dimension of K/V should expand 9 times?

Looking forward to your reply, thanks!

JingyunLiang commented 2 years ago

Thanks for your question. It means K/V should be sampled for 9 times.

mrluin commented 2 years ago

Thanks for your quick reply. Sampling 9 times on K/V and performing 9 times attention for each Q, and then aggregation, right? Does it comsumes too much GPU memory?

JingyunLiang commented 2 years ago

The attention is used for aggregation. We wrote it with cuda, so the memory is controllable.

mrluin commented 2 years ago

Sorry for another question under the closed issue.

In ablation study, I notice that you give comparisons on different alignment methods, the improvement of GDA* is about 0.07db compared with DCN. And commonly, flow-guided deformable convolution is more stable and with better performance than DCN, so how about the performance if conducting attention on K/V which are aligned by flow-guided deformable convolution?

Thanks in advance.

JingyunLiang commented 2 years ago

I haven't trained that yet, but it looks too complex (not elegant) if we stack them together.