Meituan-AutoML / CPVT

189 stars 8 forks source link

no positional information for the first self attention block #6

Open congwang093 opened 1 year ago

congwang093 commented 1 year ago

Hi, thanks for your hard work. I read the paper and if I understand correctly, the first transformer block doesn't have any positional information. would this cause any issues for passing on information to the rest of the blocks, since the self attention modules always come the some positional information? have you tried to use any other relative positional encoding methods to fill in the gap for the first block?