ViTAE-Transformer / ViTAE-Transformer-Remote-Sensing

A comprehensive list [SAMRS@NeurIPS'23, RVSA@TGRS'22, RSP@TGRS'22] of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP
438 stars 53 forks source link

use one image to test issues #19

Closed mstroke-bird closed 1 year ago

mstroke-bird commented 1 year ago

想问一下,我在利用您给的ViTAEv2模型做语义分割测试时,发现所给的图片无法通过测试, 问题出在ReductionCell.py的assert N==HW,但我在利用另外两个权重对该图片进行测试时均可正常产生结果。 1.所以该ViTAEv2模型是否只支持图片大小为2的倍数的情况呢? 2.当我使用1024512的png图片进行预测时,依旧会出问题,在ViTAE_Window_NoShiftt/base_model.py的outs.append(x.view(b,wh,wh,-1).permute(0,3,1,2))处也会出现size不一致的问题

我是用的权重如下图 image

如果您有空回答,十分感谢

DotWang commented 1 year ago

@mstroke-bird 不是的,检测那部分的HRSC2016数据集,它就是有奇数边的,是可以跑的

我看了一下代码,关于你第二个问题,我发现分割那个地方的确有点问题,在检测那个文件夹里边的base_model.py:https://github.com/ViTAE-Transformer/ViTAE-Transformer-Remote-Sensing/blob/main/Object%20Detection/mmdet/models/backbones/ViTAE_Window_NoShift/base_model.py

那块是这样的

    def forward_features(self, x, Wh, Ww):
        outs = []
        for i in range(len(self.layers)):
            layer = self.layers[i]
            x, Wh, Ww = layer(x, Wh, Ww)
            b, n, _ = x.shape
            #wh = int(math.sqrt(n))
            #norm_layer = getattr(self, f'norm{i}')
            #x_out = norm_layer(x)
            outs.append(x.view(b, Wh, Ww, -1).permute(0, 3, 1, 2).contiguous())

        return outs

至于你说的N=H*W那块,我得再排查一下,顺便问一下你·的输入尺寸和打印出来的N,H,W分别是多少

mstroke-bird commented 1 year ago

@DotWang

十分感谢您的回复!我用的是999*499的图片,尺寸已标注在下面截图的注释中了 image image

DotWang commented 1 year ago

@mstroke-bird 的确是只能支持2的倍数,但是(1024,512)是可以的,一般数据输入前会padding到合适的尺寸

mstroke-bird commented 1 year ago

@DotWang 好的谢谢,祝您科研顺利!!