autonomousvision / unimatch

[TPAMI'23] Unifying Flow, Stereo and Depth Estimation
https://haofeixu.github.io/unimatch/
MIT License
980 stars 102 forks source link

scripts for baselines in Table 1 #16

Closed IgorVasiljevic-TRI closed 1 year ago

IgorVasiljevic-TRI commented 1 year ago

I was just wondering what script (or outside repo?) I could use to replicate the baselines in Table 1 (i.e. cost volume + conv and conv + softmax). Thanks!

haofeixu commented 1 year ago

Hi, to reproduce the results of cost volume + conv, you can use the conv regressor below to predict flow from cost volume:

inter_channels = 512
self.regressor = nn.Sequential(nn.Conv2d(cost_volume_channels, inter_channels, 1),
                                ResidualBlock(inter_channels, inter_channels, channel_compressor=2,
                                                no_norm_layer=True),
                                ResidualBlock(inter_channels, inter_channels, channel_compressor=2,
                                                no_norm_layer=True),
                                ResidualBlock(inter_channels, inter_channels, channel_compressor=2,
                                                no_norm_layer=True),
                                ResidualBlock(inter_channels, inter_channels, channel_compressor=2,
                                                no_norm_layer=True),
                                nn.Conv2d(inter_channels, 2, 3, 1, 1)
                                )

class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, norm_layer=None,
                 channel_compressor=2, no_norm_layer=False,
                 dilation=1,
                 ):
​
        self.no_norm_layer = no_norm_layer
        bias = True if no_norm_layer else False
​
        super(ResidualBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
​
        self.conv1 = conv1x1(in_channels, out_channels // channel_compressor, bias=bias)
        if not no_norm_layer:
            self.bn1 = norm_layer(out_channels // channel_compressor)
        self.conv2 = conv3x3(out_channels // channel_compressor, out_channels // channel_compressor,
                             bias=bias,
                             dilation=dilation)
        if not no_norm_layer:
            self.bn2 = norm_layer(out_channels // channel_compressor)
        self.conv3 = conv1x1(out_channels // channel_compressor, out_channels, bias=bias)
        if not no_norm_layer:
            self.bn3 = norm_layer(out_channels)
        self.relu = nn.ReLU(inplace=True)
​
        self.skip = None
        if in_channels != out_channels:
            if no_norm_layer:
                self.skip = conv1x1(in_channels, out_channels, bias=bias)
            else:
                self.skip = nn.Sequential(
                    conv1x1(in_channels, out_channels, bias=bias),
                    norm_layer(out_channels)
                )
​
    def forward(self, x):
        out = self.conv1(x)
        if not self.no_norm_layer:
            out = self.bn1(out)
        out = self.relu(out)
​
        out = self.conv2(out)
        if not self.no_norm_layer:
            out = self.bn2(out)
        out = self.relu(out)
​
        out = self.conv3(out)
        if not self.no_norm_layer:
            out = self.bn3(out)
​
        if self.skip is not None:
            x = self.skip(x)
​
        out = out + x
        out = self.relu(out)
​
        return out

Note that all results in Table 1 are obtained without the self-attention propagation layer.

haofeixu commented 1 year ago

For conv + softmax, you can just simply replace our Transformer with per-image convs.

IgorVasiljevic-TRI commented 1 year ago

This is great, thank you!