OpenGVLab / InternImage

[CVPR 2023 Highlight] InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
https://arxiv.org/abs/2211.05778
MIT License
2.47k stars 231 forks source link

[bug] how to train fine-tuning classification model (size mismatch for head.bias: copying a param with shape torch.Size([1000]) from checkpoint, the shape in current model is torch.Size([4]).) #297

Open jeonga0303 opened 3 months ago

jeonga0303 commented 3 months ago
image

I customized the config.py. how to train fine-tuning classification model?

image
jeonga0303 commented 2 months ago

how to convert Nc1..?

image
image
image
jeonga0303 commented 2 months ago
image

I tried changing the file configuration in the following order.

I'm training, but the data is big, so I'll let you know the results in the future

  1. download pth file

  2. config.py

_C.DATA.IMG_SIZE = 224
_C.MODEL.PRETRAINED = 'internimage_b_1k_224.pth'
_C.MODEL.NUM_CLASSES = 4
  1. util.py ( Nc1: 1000 > Nc2: 4 ) convert load_pretrained function.
    if 'head.bias' in state_dict:
        head_bias_pretrained = state_dict['head.bias']
        Nc1 = head_bias_pretrained.shape[0]
        Nc2 = model.head.bias.shape[0]
        logger.info(f'{Nc1}, {Nc2}')
        if (Nc1 != Nc2):
            # head_weight = model.head.weight
            # head_bias = model.head.bias
            model.head.weight = torch.nn.Parameter(torch.zeros_like(model.head.weight))
            model.head.bias = torch.nn.Parameter(torch.zeros_like(model.head.bias))
            state_dict.pop('head.weight', None)
            state_dict.pop('head.bias', None)
  1. dataset/samplers.py convert iteration.
def __iter__(self):
        # deterministically shuffle based on epoch
        g = torch.Generator()
        g.manual_seed(self.epoch)

        t = torch.Generator()
        t.manual_seed(0)

        indices = torch.randperm(len(self.dataset), generator=t).tolist()
        indices = [i for i in indices if i % self.num_parts == self.rank]

        # add extra samples to make it evenly divisible
        while len(indices) < self.total_size_parts:
            indices += indices[:(self.total_size_parts - len(indices))]

        indices = indices[:self.total_size_parts]
        assert len(indices) == self.total_size_parts, f'Length of indices ({len(indices)}) does not match total_size_parts ({self.total_size_parts})'

        # subsample
        indices = indices[self.rank // self.num_parts:self.total_size_parts:self.num_replicas // self.num_parts]

        index = torch.randperm(len(indices), generator=g).tolist()
        indices = list(np.array(indices)[index])

        assert len(indices) == self.num_samples, f'Length of indices ({len(indices)}) does not match num_samples ({self.num_samples})'

        return iter(indices)
  1. cmd python -m torch.distributed.launch --nproc_per_node 2 --master_port 12345 main.py --cfg configs/without_lr_decay/internimage_b_1k_224_custom.yaml --data-path [data-path] --pretrained internimage_b_1k_224.pth --batch-size 120

my gpu is a100 * 2.

2024.06.11 train success (image classification fine-tuning)

image
jeonga0303 commented 2 months ago

[bug]

I don't think there's progress in training.. Everything's the same as loss May I know the reason?

image