Hi, there.
Can I ask you about why the outputs of the model are with four features, which return [f1, f2, f3, f4], and how the features work in the end, like plans use the flatten and SoftMax like Vits. I was also looking for the network arch, but I can't find it, would you offer it?
Lots of thanks.
ViT-CoMer, as a multi-scale backbone, shares a similar usage pattern with other similar backbones like Swin. You can refer to the network structure in vit_comer.py.
Hi, there. Can I ask you about why the outputs of the model are with four features, which return [f1, f2, f3, f4], and how the features work in the end, like plans use the flatten and SoftMax like Vits. I was also looking for the network arch, but I can't find it, would you offer it? Lots of thanks.