Alibaba-MIIL / ML_Decoder

Official PyTorch implementation of "ML-Decoder: Scalable and Versatile Classification Head" (2021)
MIT License
315 stars 52 forks source link

Missing key(s) in state_dict: "head.fc.weight", "head.fc.bias" #13

Closed zahragolpa closed 2 years ago

zahragolpa commented 2 years ago

Hello,

I am fascinated by your great work and I'm trying to experiment with your code a little bit.

I want to create an instance of the TResNet class and then load the pre-trained model for the Stanford Cars dataset into the model using PyTorch. However, it seems like the state dictionary of the TResNet class is not compatible with that of the pre-trained model that you have shared in the model zoo.

Here is the code that I use:

model = TResNet([3, 4, 23, 3], num_classes=80, in_chans=3, first_two_layers=Bottleneck).cuda()

state = torch.load(PATH_TO_THE_PRETRAINED_MODEL, map_location='cpu')
filtered_dict = {k: v for k, v in state['model'].items() if
                             (k in model.state_dict() and 'head.fc' not in k)}
# here is the issue!
model = model.load_state_dict(filtered_dict, strict=True)
model.eval()

This is the error that I get:

RuntimeError: Error(s) in loading state_dict for TResNet: Missing key(s) in state_dict: "head.fc.weight", "head.fc.bias".

Even if I ignore this by exception handling, I get poor results on the test images; all of the classes have a score around 55% to 62%.

Can you please help me solve this issue? Thank you in advance.

sorrowyn commented 2 years ago
do_bottleneck_head = False
model = TResNet([3, 4, 23, 3], num_classes=80, in_chans=3, first_two_layers=do_bottleneck_head).cuda()
mrT23 commented 2 years ago

@zahragolpa

    from src_files.models.tresnet.tresnet import Bottleneck, TResNet
    from src_files.ml_decoder.ml_decoder import add_ml_decoder_head

    model = TResNet([3, 4, 23, 3], num_classes=196, in_chans=3, first_two_layers=Bottleneck)
    model = add_ml_decoder_head(model, num_classes=196, num_of_groups=100, decoder_embedding=768)
    state = torch.load('./tresnet_l_stanford_card_96.41.pth', map_location='cpu')
    model.load_state_dict(state['model'], strict=True)
    model.cuda().half().eval()

should work.

although i recommend using _createmodel with proper args, its safer.

zahragolpa commented 2 years ago

Thank you for your comment @sorrowyn. When I apply your suggestion to the code, I get the following error:

<ipython-input-2-4026dba6c24c> in _make_layer(self, block, planes, blocks, stride, use_se, anti_alias_layer)
    590     def _make_layer(self, block, planes, blocks, stride=1, use_se=True, anti_alias_layer=None):
    591         downsample = None
--> 592         if stride != 1 or self.inplanes != planes * block.expansion:
    593             layers = []
    594             if stride == 2:

AttributeError: 'bool' object has no attribute 'expansion'

Seems like setting first_two_layers to False will raise an error in the _make_layer function because it is expecting a value of type Bottleneck, and not a boolean.

zahragolpa commented 2 years ago

Thank you @mrT23! It works. I am closing this issue as it has been resolved with your solution.