Why img_neck has no encoder layer?

Great work! But I encounter some errors when running your code testing "PolarFormer, R101_DCN" and "PolarFormer-w/o_bev_aug, R101_DCN" using the checkpoints you posted.

The neck is defined like:

img_neck=dict(
    type='FPN_TRANS',
    num_encoder=0, # encoder is not used here
    num_decoder=3,
    num_levels=3,
    ...
        ),

The img_neck is defined with no encoder layer, but transformer with no layers is not supported by pytorch. And therefore, I have these errors:

File "tools/test.py", line 222, in main outputs = multi_gpu_test(model, data_loader, args.tmpdir, File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/mmdet/apis/test.py", line 109, in multi_gpu_test result = model(return_loss=False, rescale=True, data) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward output = self._run_ddp_forward(inputs, kwargs) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 165, in _run_ddp_forward return module_to_run(*inputs[0], kwargs[0]) # type: ignore File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 208, in new_func return old_func(args, kwargs) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/detectors/polarformer.py", line 118, in forward return self.forward_test(kwargs) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/detectors/polarformer.py", line 171, in forward_test return self.simple_test(img_metas[0], img[0], kwargs) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/detectors/polarformer.py", line 192, in simple_test img_feats = self.extract_feat(img=img, img_metas=img_metas, gt_bboxes_3d=gt_bboxes_3d, gt_labels_3d=gt_labels_3d) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func return old_func(*args, kwargs) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/detectors/polarformer.py", line 77, in extract_feat img_feats = self.extract_img_feat(img, img_metas, gt_bboxes_3d, gt_labels_3d) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/detectors/polarformer.py", line 70, in extract_img_feat img_feats = self.img_neck(img_feats, img_metas, gt_bboxes_3d, gt_labels_3d) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/necks/fpn_trans.py", line 213, in forward ret_polar_ray_list = self._forward_single_camera(feature_single_cam, cam2lidar_info_single_cam, cam_intrinsic_single_cam) File "xxx/PolarFormer-main/projects/mmdet3d_plugin/models/necks/fpn_trans.py", line 146, in _forward_single_camera bev_out = self.transformer_layers[i](img_columns, polar_rays) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 145, in forward memory = self.encoder(src, mask=src_mask, src_key_padding_mask=src_key_padding_mask) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 206, in forward first_layer = self.layers[0] File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 197, in getitem return self._modules[self._get_abs_string_index(idx)] File "xxx/miniconda3/envs/py38/lib/python3.8/site-packages/torch/nn/modules/container.py", line 187, in _get_abs_string_index raise IndexError('index {} is out of range'.format(idx)) IndexError: index 0 is out of range

So, how to solve it?

Hi! In early experiments, img_neck had the encoder, however, to reduce the computational burden, we removed it in later experiments. Since the current version of the code has many differences from the old one, I'm not very sure if the encoder in img_neck can boost the performance of the model. Maybe you can just set num_encoders to a non-zero number to give it a try.

I guess the error you met was caused by the mismatch of the PyTorch version. We've tested the model with Pytorch 1.8.x + CUDA 10.2/11.1/11.6, and it worked well. I find pytorch has an update in torch.nn.transformer since version 1.12, i.e., the code of class TransformerEncoder changes from

# v1.8.0
class TransformerEncoder(Module):
  [[docs]](https://pytorch.org/docs/1.8.0/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder.forward)    
         def forward(self, src: Tensor, mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor:
            r"""Pass the input through the encoder layers in turn.

            Args:
                src: the sequence to the encoder (required).
                mask: the mask for the src sequence (optional).
                src_key_padding_mask: the mask for the src keys per batch (optional).

            Shape:
                see the docs in Transformer class.
            """
            output = src

            for mod in self.layers:
                output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)

            if self.norm is not None:
                output = self.norm(output)

            return output

# v1.12
class TransformerEncoder(Module):
        [[docs]](https://pytorch.org/docs/1.12/generated/torch.nn.TransformerEncoder.html#torch.nn.TransformerEncoder.forward)    
        def forward(self, src: Tensor, mask: Optional[Tensor] = None, src_key_padding_mask: Optional[Tensor] = None) -> Tensor:
            r"""Pass the input through the encoder layers in turn.

            Args:
                src: the sequence to the encoder (required).
                mask: the mask for the src sequence (optional).
                src_key_padding_mask: the mask for the src keys per batch (optional).

            Shape:
                see the docs in Transformer class.
            """
            output = src
            convert_to_nested = False
            first_layer = self.layers[0]
            if isinstance(first_layer, torch.nn.TransformerEncoderLayer):
                if (not first_layer.norm_first and not first_layer.training and
                        first_layer.self_attn.batch_first and
                        first_layer.self_attn._qkv_same_embed_dim and first_layer.activation_relu_or_gelu and
                        first_layer.norm1.eps == first_layer.norm2.eps and
                        src.dim() == 3 and self.enable_nested_tensor) :
                    if src_key_padding_mask is not None and not output.is_nested and mask is None:
                        tensor_args = (
                            src,
                            first_layer.self_attn.in_proj_weight,
                            first_layer.self_attn.in_proj_bias,
                            first_layer.self_attn.out_proj.weight,
                            first_layer.self_attn.out_proj.bias,
                            first_layer.norm1.weight,
                            first_layer.norm1.bias,
                            first_layer.norm2.weight,
                            first_layer.norm2.bias,
                            first_layer.linear1.weight,
                            first_layer.linear1.bias,
                            first_layer.linear2.weight,
                            first_layer.linear2.bias,
                        )
                        if not torch.overrides.has_torch_function(tensor_args):
                            if not torch.is_grad_enabled() or all([not x.requires_grad for x in tensor_args]):
                                if output.is_cuda or 'cpu' in str(output.device):
                                    convert_to_nested = True
                                    output = torch._nested_tensor_from_mask(output, src_key_padding_mask.logical_not())

            for mod in self.layers:
                if convert_to_nested:
                    output = mod(output, src_mask=mask)
                else:
                    output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)

            if convert_to_nested:
                output = output.to_padded_tensor(0.)

            if self.norm is not None:
                output = self.norm(output)

            return output

So num_encoder could not be zero in pytorch >=1.12. You could just downgrade the PyTorch, OR, if you must use the latest version of Pytorch, you could modify the code and only use nn.TransformerDecoder, but do remember to add LayerNorm for image features manually in case of performance drop.

Sorry that I didn't indicate the pytorch version we use in install.md. I will add it. : )

Thank you for your replay! I will try it.

fudan-zvg / PolarFormer

Why img_neck has no encoder layer? #5