RuntimeError: Function SubmanifoldConvolutionFunctionBackward returned an invalid gradient at index 0 - got [0, 128] but expected shape compatible with [0, 64]

forvd commented 4 years ago

I've tried to use SparseConv in point cloud segmentation. Got some errors. After some steps, got this weird error. I use spconv.VoxelGerneratorV2 to gernerate the voxels and coors here is my code

class Network(nn.Module): def init(self, num_classes=20, use_norm=True, cfg=None, use_xyz=True): super().init() self.backbone = SCN_UNet(cfg) self.fc = nn.Sequential(pt_utils.Conv1d(32, 128, bn=True, bias=False), nn.Dropout(0.5), pt_utils.Conv1d(128, num_classes, activation=None))

def forward(self, input_data):
    output = {}
    voxels = torch.tensor(input_data['voxels'], dtype=torch.float32).cuda()

    coords = torch.tensor(input_data['coordinates'], dtype=torch.float32).cuda()

    num_points_per_voxel = torch.tensor(input_data['num_points_per_voxel'], dtype=torch.int32).cuda()
    output_shape = input_data['grid_size'].tolist()[0][::-1]
    # final_input = [coords, voxels, 1]
    features = self.backbone(voxels, coords, num_points_per_voxel, output_shape, 1)
    output['backbone_features'] = torch.unsqueeze(features, dim=2)

    output['cls_out'] = self.fc(output['backbone_features']).transpose(1, 2).contiguous()
    return output

class SCN_UNet(nn.Module): def init(self, config): nn.Module.init(self) self.dimension = config.Backbone.dimension self.spatialSize = config.Backbone.spatialSize self.numFeatures = config.Backbone.numFeatures self.reps = config.Backbone.reps self.nPlanes = config.Backbone.nPlanes self.sparseModel = scn.Sequential().add( scn.InputLayer(self.dimension, self.spatialSize, mode=3)).add( scn.SubmanifoldConvolution(self.dimension, 4, self.numFeatures, 3, False)).add( scn.UNet(self.dimension, self.reps, self.nPlanes, residual_blocks=False, downsample=[2, 2])).add( scn.BatchNormReLU(self.numFeatures)).add( scn.OutputLayer(self.dimension)) self.linear = nn.Linear(config.Backbone.numFeatures, 20)

def forward(self, features, coors, num_voxels, grid_size, batch_size):
    points_mean = features[:, :, :, :(self.dimension + 1)].sum(
        dim=2, keepdim=False) / num_voxels.type_as(features).view(-1, 1)
    points_mean = points_mean.contiguous()
    # coors[:, 1] += 1
    coors = coors.int()
    spare_input = [torch.squeeze(coors), torch.squeeze(points_mean)]
    x = self.sparseModel(spare_input)
    # x = self.linear(x)
    return x

btgraham commented 4 years ago

Is it possible that the entire batch is empty. This would happen in the input points don't fall in the cube [0, self.spatialSize-1]^3 ?

Can you print out as input that generates the error, please?

forvd commented 4 years ago

my input of self.sparseMode: [coords, features]

shape:torch.Size([12400, 3]) torch.Size([12400, 4]) [tensor([[ 12, 120, 223], [ 12, 120, 221], [ 12, 120, 220], ..., [ 4, 111, 124], [ 5, 115, 125], [ 5, 116, 124]], device='cuda:0', dtype=torch.int32), tensor([[30.9434, 0.0519, 1.2550, 0.2350], [30.4116, 0.1957, 1.2379, 0.2700], [30.2993, 0.2905, 1.2339, 0.1400], ..., [ 1.3888, -2.4004, -1.1622, 0.0000], [ 1.6063, -1.2806, -0.8258, 0.0000], [ 1.4836, -1.0410, -0.7130, 0.0000]], device='cuda:0')]

HaiwangYu commented 4 years ago

I had the same error and @btgraham 's comment works for me. After arranging the coords within [0, spatialSize]^3, loss.backward() works OK. It would be even better if there are some instructions/explanations on this spatialSize.

facebookresearch / SparseConvNet

RuntimeError: Function SubmanifoldConvolutionFunctionBackward returned an invalid gradient at index 0 - got [0, 128] but expected shape compatible with [0, 64] #146