There is one place where the code is not well understood.

LeapLabTHU / FLatten-Transformer

Official repository of FLatten Transformer (ICCV2023)

377 stars 21 forks source link

There is one place where the code is not well understood. #26

Closed DABINB closed 1 month ago

DABINB commented 1 month ago

kv = (k.transpose(-2, -1) * (N * -0.5)) @ (v (N -0.5)) I can understand that N is for scaling dot product attention. But the shapes of k and v have been changed by sr_ration to (B, N1, C), i.e. N1 = N/sr_ration2. So shouldn't it be, to add an If self.sr_ratio>1, to deal with this part, to replace N with N1. Since this part isn't mentioned in the paper, it's just my personal understanding and I hope that you will be able to explain it, thank you very much.

tian-qing001 commented 1 month ago

Hi @DABINB, thanks for pointing out. This is a small bug caused by our recent code update and we already fix it in the latest commit.