MhLiao / DB

A PyTorch implementation of "Real-time Scene Text Detection with Differentiable Binarization".
2.08k stars 475 forks source link

Error when training model with MobileNetv3-Small backbone #269

Open CRCGlobal opened 3 years ago

CRCGlobal commented 3 years ago

I've successfully trained a MobileNetv3-Large backbone on ICDAR 2015. (See here for results.) However, I get the error below when trying to train a model with a MobileNetv3-Small backbone. @Microkitty, any suggestions?

Training command and resulting error:

$ CUDA_VISIBLE_DEVICES=0 python train.py experiments/seg_detector/ic15_mobilenet_v3_small_thre.yaml --num_gpus 1
[INFO] [2021-06-10 16:08:49,621] Training epoch 0
Traceback (most recent call last):
  File "train.py", line 70, in <module>
    main()
  File "train.py", line 67, in main
    trainer.train()
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 86, in train
    epoch=epoch, step=self.steps)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 109, in train_step
    results = model.forward(batch, training=True)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 56, in forward
    pred = self.model(data, training=self.training)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 19, in forward
    return self.decoder(self.backbone(data), *args, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/backbones/mobilenetv3.py", line 211, in forward
    x = self.features[stage](x)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 164, in __getitem__
    return self._modules[self._get_abs_string_index(idx)]
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/container.py", line 154, in _get_abs_string_index
    raise IndexError('index {} is out of range'.format(idx))
IndexError: index 16 is out of range

This is my .yaml file:

import:
    - 'experiments/seg_detector/base_ic15.yaml'
package: []
define:
  - name: 'Experiment'
    class: Experiment
    structure: 
        class: Structure
        builder: 
            class: Builder
            model: SegDetectorModel
            model_args:
                backbone: mobilenet_v3_small
                decoder: SegDetector
                decoder_args: 
                    adaptive: True
                    in_channels: [24, 40, 112, 960]
                    k: 50
                loss_class: L1BalanceCELoss
        representer:
            class: SegDetectorRepresenter
            max_candidates: 1000
        measurer:  
            class: QuadMeasurer
        visualizer:  
            class: SegDetectorVisualizer
    train: 
        class: TrainSettings
        data_loader: 
            class: DataLoader
            dataset: ^train_data
            batch_size: 8
            num_workers: 4
        checkpoint: 
            class: Checkpoint
            start_epoch: 0
            start_iter: 0
            resume: null
        model_saver: 
            class: ModelSaver
            dir_path: model
            save_interval: 1000
            signal_path: save
        scheduler: 
            class: OptimizerScheduler
            optimizer: "SGD"
            optimizer_args:
                lr: 0.007
                momentum: 0.9
                weight_decay: 0.0001
            learning_rate:  
                class: DecayLearningRate
                epochs: 1200
        epochs: 1200

    validation: &validate
        class: ValidationSettings
        data_loaders:
            icdar2015: 
                class: DataLoader
                dataset: ^validate_data
                batch_size: 1
                num_workers: 16
                collect_fn:
                    class: ICDARCollectFN
        visualize: false
        interval: 1000
        exempt: 1

    logger:
        class: Logger
        verbose: true
        level: info
        log_interval: 1000

    evaluation: *validate
CRCGlobal commented 3 years ago

I've discovered part of the problem but my attempt at fixing it still results in an error. The backbone layers we need to draw from in the Small model are different than those in the Large model.

I made this change to the .yaml file:

in_channels: [24, 40, 96, 576]

See these values of Table 2 from the original publication and in mobilenetv3.py for reference:

        elif mode == 'small':
            # refer to Table 2 in paper
            mobile_setting = [
                # k, exp, c,  se,     nl,  s,
                [3, 16,  16,  True,  'RE', 2],
                [3, 72,  24,  False, 'RE', 2],
                [3, 88,  24,  False, 'RE', 1],  ### 3
                [5, 96,  40,  True,  'HS', 2],
                [5, 240, 40,  True,  'HS', 1],
                [5, 240, 40,  True,  'HS', 1],  ### 6
                [5, 120, 48,  True,  'HS', 1],
                [5, 144, 48,  True,  'HS', 1],
                [5, 288, 96,  True,  'HS', 2],  ### 9
                [5, 576, 96,  True,  'HS', 1],
                [5, 576, 96,  True,  'HS', 1],
            ]

I then saved the mode ('small' or 'large') as a MobileNetV3 instance attribute, and used that in .forward() to identify the backbone output layers differently for the Small mobile and Large model.

    def forward(self, x):
        '''x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x'''
        if self.mode=='large':
            x2, x3, x4, x5 = None, None, None, None
            for stage in range(17): # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/modeling/backbones/det_mobilenet_v3.py
                x = self.features[stage](x)
                if stage == 3:  # if s == 2 and start_idx > 3
                    x2 = x
                elif stage == 6:
                    x3 = x
                elif stage == 12:
                    x4 = x
                elif stage == 16:
                    x5 = x
            return x2, x3, x4, x5
        elif self.mode=='small':
            x2, x3, x4, x5 = None, None, None, None
            for stage in range(13): # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/modeling/backbones/det_mobilenet_v3.py
                x = self.features[stage](x)
                if stage == 3:  # if s == 2 and start_idx > 3
                    x2 = x
                elif stage == 6:
                    x3 = x
                elif stage == 9:
                    x4 = x
                elif stage == 12:
                    x5 = x
            return x2, x3, x4, x5
        else:
            raise NotImplementedError

But now I get an error indicating one the layers has 2x more channels than expected at an upsample and sum command.

$ CUDA_VISIBLE_DEVICES=0 python train.py experiments/seg_detector/ic15_mobilenet_v3_small_thre.yaml --num_gpus 1
[INFO] [2021-06-10 17:29:43,453] Training epoch 0
Traceback (most recent call last):
  File "train.py", line 70, in <module>
    main()
  File "train.py", line 67, in main
    trainer.train()
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 86, in train
    epoch=epoch, step=self.steps)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 109, in train_step
    results = model.forward(batch, training=True)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 56, in forward
    pred = self.model(data, training=self.training)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 19, in forward
    return self.decoder(self.backbone(data), *args, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/decoders/seg_detector.py", line 124, in forward
    out4 = self.up5(in5) + in4  # 1/16
RuntimeError: The size of tensor a (40) must match the size of tensor b (20) at non-singleton dimension 3
shantzhou commented 2 years ago

I've discovered part of the problem but my attempt at fixing it still results in an error. The backbone layers we need to draw from in the Small model are different than those in the Large model.

I made this change to the .yaml file:

in_channels: [24, 40, 96, 576]

See these values of Table 2 from the original publication and in mobilenetv3.py for reference:

        elif mode == 'small':
            # refer to Table 2 in paper
            mobile_setting = [
                # k, exp, c,  se,     nl,  s,
                [3, 16,  16,  True,  'RE', 2],
                [3, 72,  24,  False, 'RE', 2],
                [3, 88,  24,  False, 'RE', 1],  ### 3
                [5, 96,  40,  True,  'HS', 2],
                [5, 240, 40,  True,  'HS', 1],
                [5, 240, 40,  True,  'HS', 1],  ### 6
                [5, 120, 48,  True,  'HS', 1],
                [5, 144, 48,  True,  'HS', 1],
                [5, 288, 96,  True,  'HS', 2],  ### 9
                [5, 576, 96,  True,  'HS', 1],
                [5, 576, 96,  True,  'HS', 1],
            ]

I then saved the mode ('small' or 'large') as a MobileNetV3 instance attribute, and used that in .forward() to identify the backbone output layers differently for the Small mobile and Large model.

    def forward(self, x):
        '''x = self.features(x)
        x = x.mean(3).mean(2)
        x = self.classifier(x)
        return x'''
        if self.mode=='large':
            x2, x3, x4, x5 = None, None, None, None
            for stage in range(17): # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/modeling/backbones/det_mobilenet_v3.py
                x = self.features[stage](x)
                if stage == 3:  # if s == 2 and start_idx > 3
                    x2 = x
                elif stage == 6:
                    x3 = x
                elif stage == 12:
                    x4 = x
                elif stage == 16:
                    x5 = x
            return x2, x3, x4, x5
        elif self.mode=='small':
            x2, x3, x4, x5 = None, None, None, None
            for stage in range(13): # https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.1/ppocr/modeling/backbones/det_mobilenet_v3.py
                x = self.features[stage](x)
                if stage == 3:  # if s == 2 and start_idx > 3
                    x2 = x
                elif stage == 6:
                    x3 = x
                elif stage == 9:
                    x4 = x
                elif stage == 12:
                    x5 = x
            return x2, x3, x4, x5
        else:
            raise NotImplementedError

But now I get an error indicating one the layers has 2x more channels than expected at an upsample and sum command.

$ CUDA_VISIBLE_DEVICES=0 python train.py experiments/seg_detector/ic15_mobilenet_v3_small_thre.yaml --num_gpus 1
[INFO] [2021-06-10 17:29:43,453] Training epoch 0
Traceback (most recent call last):
  File "train.py", line 70, in <module>
    main()
  File "train.py", line 67, in main
    trainer.train()
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 86, in train
    epoch=epoch, step=self.steps)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/trainer.py", line 109, in train_step
    results = model.forward(batch, training=True)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 56, in forward
    pred = self.model(data, training=self.training)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/structure/model.py", line 19, in forward
    return self.decoder(self.backbone(data), *args, **kwargs)
  File "/home/mroos/python_envs/env_torch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/mroos/Code/gatekeeper_differentiable_binarization/decoders/seg_detector.py", line 124, in forward
    out4 = self.up5(in5) + in4  # 1/16
RuntimeError: The size of tensor a (40) must match the size of tensor b (20) at non-singleton dimension 3

large is downsample 8 after step3, however samll is 4, you need fix "mobile_setting" or seg_decoder "forward"