OpenGVLab / Vision-RWKV

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
https://arxiv.org/abs/2403.02308
Apache License 2.0
288 stars 11 forks source link

segmentation output #9

Open yezizi1022 opened 2 months ago

yezizi1022 commented 2 months ago

for the class VRWKV_Adapter(VRWKV), why the output is f1 = self.norm1(c1) f2 = self.norm2(c2) f3 = self.norm3(c3) f4 = self.norm4(c4) i.e. torch.Size([2, 256, 56, 56]) torch.Size([2, 256, 28, 28]) torch.Size([2, 256, 14, 14]) torch.Size([2, 256, 7, 7]) rather than single output, for example, torch.size([2 256 224 224])

duanduanduanyuchen commented 2 months ago

Hi, this is because we use a structure like ViT-Adapter to generate multi-scale features for dense tasks.