for the class VRWKV_Adapter(VRWKV), why the output is
f1 = self.norm1(c1)
f2 = self.norm2(c2)
f3 = self.norm3(c3)
f4 = self.norm4(c4)
i.e. torch.Size([2, 256, 56, 56])
torch.Size([2, 256, 28, 28])
torch.Size([2, 256, 14, 14])
torch.Size([2, 256, 7, 7])
rather than single output, for example, torch.size([2 256 224 224])
for the class VRWKV_Adapter(VRWKV), why the output is f1 = self.norm1(c1) f2 = self.norm2(c2) f3 = self.norm3(c3) f4 = self.norm4(c4) i.e. torch.Size([2, 256, 56, 56]) torch.Size([2, 256, 28, 28]) torch.Size([2, 256, 14, 14]) torch.Size([2, 256, 7, 7]) rather than single output, for example, torch.size([2 256 224 224])