JingyunLiang / VRT

VRT: A Video Restoration Transformer (official repository)
https://arxiv.org/abs/2201.12288
Other
1.37k stars 130 forks source link

why not patch? #37

Open mountain-three opened 2 years ago

mountain-three commented 2 years ago

Why don't you treat patch as a token to embedding but use the channel as the embedding dim

JingyunLiang commented 2 years ago

It's a good idea. Our way is like using 1x1 patches. Using larger patch may reduce the computation cost significantly. However, I haven't tried it yet, because I personally believe low-level problems should keep pixel information. Maybe you can try it and give some feedbacks ~

mountain-three commented 2 years ago

Thank you I will think about it

------------------ 原始邮件 ------------------ 发件人: "JingyunLiang/VRT" @.>; 发送时间: 2022年6月18日(星期六) 下午4:41 @.>; @.**@.>; 主题: Re: [JingyunLiang/VRT] why not patch? (Issue #37)

It's a good idea. Our way is like using 1x1 patches. Using larger patch may reduce the computation cost significantly. However, I haven't tried it yet, because I personally believe low-level problems should keep pixel information. Maybe you can try it and give some feedbacks ~

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>