1404561326521 commented 5 months ago

作者您好，在DFormer.py的attention模块中有一步骤self.pool = nn.AdaptiveAvgPool2d(output_size=(7,7))，它的目的是可以接受任意输入shape，返回（7,7）的池化结果，但是这里的AdaptiveAvgPool2d算子在导出onnx模型时不被支持，我想请问设置动态输入是否有什么具体的作用，如果想导出onnx模型该如何解决这个问题，虚心请教

caojiaolong commented 5 months ago

感谢您的反馈和对我们项目的关注！对于您提到的导出onnx模型遇到的问题，我们认为有两种方案可以解决：

尝试使用自定义的AdaptiveAvgPool2d代替torch的实现，您可以参考 https://github.com/pytorch/pytorch/issues/74034 。注意在导出时需要设置torch.onnx.export的dynamic_axes。
如果您的输入是固定大小的，可以尝试将AdaptiveAvgPool2d改成普通的AvgPool2d，但是需要注意每个stage的AvgPool2d都需要单独计算。

1404561326521 commented 5 months ago

好的，我已经尝试对AdaptiveAvgPool2d操作进行自定义替代了，在推理时我还发现一个问题，使用evaluate_msf方法，推理效果很满意，但是其中涉及[0.5, 0.75, 1.0, 1.25, 1.5]这5个缩放因子，意味着要推理五遍再把结果融合起来，这个过程是十分耗时的，使用evaluate方法效果就会差很多，这是否和训练模型时设置的C.train_scale_array = [0.5, 0.75, 1, 1.25, 1.5, 1.75]相关？能否将模型在训练过程中保持单一尺度，去除多尺度操作，这样在推理时也只需推理一遍即可？这样的效果是否会差很多，不知作者您有没有尝试过，望指点一二

caojiaolong commented 5 months ago

evaluate_msf是在多个尺度上evaluate后进行fusion,目的是多尺度测试提升测试性能以及鲁棒性。而训练时的那个[0.5, 0.75, 1, 1.25, 1.5, 1.75]目的是数据增强，在有限的图片上crop出多样化的图片，间接地提升训练数据量，在这个操作中先按不同rate resize图片，然后crop出固定尺寸来输入模型。

对于一般场景，我认为这个多尺度resize and crop的数据增强往往是必要的。常见的数据集包括ADE20K都会有这样的操作。

对于某些固定的场景或者特殊数据集，可能不必进行resize and crop这种数据增强，比如要分割的场景模式比较固定，进行resize和crop后可能会丢失关键信息，这种情况下或许不需要这些数据增强。

1404561326521 commented 5 months ago

evaluate_msf在多个尺度上evaluate后进行融合确实极大提升了分割效果，但是时间成本太大，意味着有多少个尺度就要推理几遍，evaluate在单尺度下推理的结果很奇怪，会有一圈黑边，不知作者您对这个有什么好的解决办法嘛

caojiaolong commented 5 months ago

您好，我这边测试了一下单尺度推理的结果，并可视化了出来，并没有如您所说的一圈黑边：

5e251e4825c1dfb3c656eee406528e2

建议您可以检查一下训练时的config文件里面的图片大小设置是否正确，或是图片读取时可能意外地pad成了不同的大小。若仍不能解决，您可以再给出一些具体的结果，以便让我们更好地分析可能的原因。

1404561326521 commented 5 months ago

好的，感谢提醒，已纠正该错误

1404561326521 commented 5 months ago

作者您好，再次请教您一个问题，在将训练模型转onnx时设置了dynamic_axis 动态输入，使用onnx进行推理时会报错：onnxruntime::ReshapeHelper::ReshapeHelper gsl::narrow_cast(input_shape.Size()) == size was false. The input tensor cannot be reshaped to the requested shape. Input shape:{1,2,42,16}, requested shape:{1,2,7,7,16}，我在网上找到了出现该错误的原因，https://github.com/pytorch/pytorch/issues/99701，里面给出了transformer也出现了该问题并给出了对应解决方案，但是对于DFormer该如何修改需要请教一下，十分感谢

caojiaolong commented 5 months ago

您好，我们猜测可能是您之前修改pooling时存在问题，导致pooling的输出大小不再是7*7，而且别的值，进而导致代码中的：

https://github.com/VCIP-RGBD/DFormer/blob/2aa25e362807b1027bddb3046a96cf1c8ec89cbf/models/encoders/DFormer.py#L122

的short_cut变量输出大小错误的变成为[1,2,42,16]，而不是[1,2,49,16]，进而导致

https://github.com/VCIP-RGBD/DFormer/blob/2aa25e362807b1027bddb3046a96cf1c8ec89cbf/models/encoders/DFormer.py#L128

的reshape函数报错.

建议您检查下修改的pooling函数是否正确。

1404561326521 commented 5 months ago

我将原始的nn.AdaptiveAvgPool2d(该算子不支持导出ONNX)修改为了：class AdaptiveAvgPool2dCustom(nn.Module): def init(self, output_size): super(AdaptiveAvgPool2dCustom, self).init() self.output_size = np.array(output_size)

def forward(self, x: torch.Tensor):
    '''
    Args:
        x: shape (batch size, channel, height, width)
    Returns:
        x: shape (batch size, channel, 1, output_size)
    '''
    shape_x = x.shape
    if (shape_x[-1] < self.output_size[-1]):
        paddzero = torch.zeros((shape_x[0], shape_x[1], shape_x[2], self.output_size[-1] - shape_x[-1]))
        paddzero = paddzero.to('cuda:0')
        x = torch.cat((x, paddzero), axis=-1)

    if (shape_x[-2] < self.output_size[-2]):
        paddzero = torch.zeros((shape_x[0], shape_x[1], shape_x[2], self.output_size[-2] - shape_x[-2]))
        paddzero = paddzero.to('cuda:0')
        x = torch.cat((x, paddzero), axis=-1)

    stride_size = np.floor(np.array(x.shape[-2:]) / self.output_size).astype(np.int32)
    kernel_size = np.array(x.shape[-2:]) - (self.output_size - 1) * stride_size
    avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))
    x = avg(x)
    return x，修改之后可以保证模型正常训练和推理，但是导出的onnx模型有问题，onnx只支持静态图，，所以模型内部的一些动态shape无法获取，警告：

1404561326521 commented 5 months ago

我将原始的nn.AdaptiveAvgPool2d(该算子不支持导出ONNX)修改为：class AdaptiveAvgPool2dCustom(nn.Module): def init (self, output_size): super(AdaptiveAvgPool2dCustom, self). init () self.output_size = np.array(output_size)

def forward(self, x: torch.Tensor):
    '''
    Args:
        x: shape (batch size, channel, height, width)
    Returns:
        x: shape (batch size, channel, 1, output_size)
    '''
    shape_x = x.shape
    if (shape_x[-1] < self.output_size[-1]):
        paddzero = torch.zeros((shape_x[0], shape_x[1], shape_x[2], self.output_size[-1] - shape_x[-1]))
        paddzero = paddzero.to('cuda:0')
        x = torch.cat((x, paddzero), axis=-1)

    if (shape_x[-2] < self.output_size[-2]):
        paddzero = torch.zeros((shape_x[0], shape_x[1], shape_x[2], self.output_size[-2] - shape_x[-2]))
        paddzero = paddzero.to('cuda:0')
        x = torch.cat((x, paddzero), axis=-1)

    stride_size = np.floor(np.array(x.shape[-2:]) / self.output_size).astype(np.int32)
    kernel_size = np.array(x.shape[-2:]) - (self.output_size - 1) * stride_size
    avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))
    x = avg(x)
    return x，修改之后可以保证模型正常训练和推理，但是导出的onnx模型有问题，onnx只支持静态图，，所以模型内部的一些动态shape无法获取，警告：

上述过程虽然可以导出onnx模型，但是使用c++加载onnx推理时报错：onnxruntime::ReshapeHelper::ReshapeHelper gsl::narrow_cast(input_shape.Size( )) == 大小是错误的。输入张量无法重新整形为请求的形状。输入形状：{1,2,42,16}，请求形状：{1,2,7,7,16}，这里的不仅数值对不上，维度也对不上，查找相关原因就找到了相关链接：https://github.com/pytorch/pytorch/issues/99701，但是对于DFormer该如何解决这个问题，还需要请见一下您，或者您能提供一下导出onnx的脚本吗？这是我导出的过程：`import torch from models.builder import EncoderDecoder as segmodel import torch.nn as nn from local_configs.NYUDepthv2.DFormer_Tiny import C as config

criterion = nn.CrossEntropyLoss(reduction="mean", ignore_index=config.background)

BatchNorm2d = nn.BatchNorm2d

model = segmodel(cfg=config, criterion=criterion, norm_layer=BatchNorm2d, single_GPU=(True), )

checkpoint = torch.load(r'../weights/change_model/tiny/epoch-282.pth')

model.load_state_dict(checkpoint['model']) model.eval()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device)

img_shape = (1, 3, 480, 640) depth_shape = (1, 3, 480, 640)

img_input = torch.randn(img_shape, device=device) depth_input = torch.randn(depth_shape, device=device)

运行前向传播以推断内核大小

with torch.no_grad(): out = model(img_input, depth_input)

input_names = ["ImgInput", "DepthInput"] dynamic_axes = {'ImgInput': {0: 'batch_size', 2: 'height', 3: 'width'}, 'DepthInput': {0: 'batch_size', 2: 'height', 3: 'width'}}

model_path = r'../weights/change_model/tiny/epoch-282.onnx'

将模型导出为ONNX格式，同时传入两个张量作为输入

torch.onnx.export(model, (img_input, depth_input), model_path, input_names=input_names, verbose=True, opset_version=11)`

caojiaolong commented 5 months ago

您好，我注意到您的AdaptiveAvgPool2dCustom似乎存在问题，我认为可以修改如下：

class AdaptiveAvgPool2dCustom(nn.Module):
    def __init__(self, output_size):
        super(AdaptiveAvgPool2dCustom, self).__init__()
        self.output_size = np.array(output_size)

    def forward(self, x: torch.Tensor):
        """
        Args:
            x: shape (batch size, channel, height, width)
        Returns:
            x: shape (batch size, channel, 1, output_size)
        """
        shape_x = x.shape
        if shape_x[-1] < self.output_size[-1]:
            paddzero = torch.zeros(
                (shape_x[0], shape_x[1], shape_x[2], self.output_size[-1] - shape_x[-1])
            )
            paddzero = paddzero.cuda()
            x = torch.cat((x, paddzero), axis=-1)
        shape_x = x.shape
        if shape_x[-2] < self.output_size[-2]:
            paddzero = torch.zeros(
                (shape_x[0], shape_x[1], self.output_size[-2] - shape_x[-2], shape_x[3])
            )
            paddzero = paddzero.cuda()
            x = torch.cat((x, paddzero), axis=-2)

        stride_size = np.floor(np.array(x.shape[-2:]) / self.output_size).astype(
            np.int32
        )
        kernel_size = np.array(x.shape[-2:]) - (self.output_size - 1) * stride_size
        avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))
        x = avg(x)
        return x

这样应该能保证输出是正确的7*7大小，您可以检查这样修改后是否还存在问题。

VCIP-RGBD / DFormer

导出onnx时出现了不支持AdaptiveAvgPool2d算子 #17

运行前向传播以推断内核大小

将模型导出为ONNX格式，同时传入两个张量作为输入