TheShadow29 / zsgnet-pytorch

Official implementation of ICCV19 oral paper Zero-Shot grounding of Objects from Natural Language Queries (https://arxiv.org/abs/1908.07129)
MIT License
69 stars 12 forks source link

Demo example #14

Closed FioPio closed 2 years ago

FioPio commented 2 years ago

Hey! I love the project and I would like to use it in my master thesis (with the correspondent credit ofc), I managed to train a model but I feel a little bit lost when using it. Could it be possible to have a small example of loading a model and giving one image and one query and get the output?

Thank you so much in advanced,

Ferriol

TheShadow29 commented 2 years ago

@FioPio Sorry for the late reply. Somehow I missed this.

To initialize a model, see get_default_net

To load an existing model, use mdl.load_state_dict(state_dict) where `state_dict = torch.load(/path/to/savedmodel)'

For data processing the easiest way would be to create a csv file with , pairs same as flickr.

Otherwise, you can use simple_item_getter function here (https://github.com/TheShadow29/zsgnet-pytorch/blob/master/code/dat_loader.py#L98). Then pass the output through collater and then pass it through the model.

Hope that helps.

FioPio commented 2 years ago

Hey, I am trying to proceed as said, but I am encountering a small problem, looks like the mdl.py file does not have a load_state_dict, so I am trying to use the same approach you are using on the get_default_net ( ssd_vgg.build_ssd( ), but that only accepts "train" and "test" options, and since I just want to use the trained model I am not sure.

What do you think about it?

Thank you,

Ferriol

TheShadow29 commented 2 years ago

@FioPio Sorry, I wasn't clear, the second model is the network itself.

model = get_default_net()
pretrained_model = torch.load(/path/to/checkpoint/)
model.load_state_dict(pretrained_model)

Does that make sense?

FioPio commented 2 years ago

Hey, I tryed this:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

#import mdl
from mdl import get_default_net
from extended_config import cfg as conf
#from dat_loader import get_data

import torch

MODEL_PATH = "tmp/models/referit_try.pth"

print('=============================')
print('Loading model')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
model.load_state_dict(pretrained_model)

But then I get this error:

loaded pretrained vgg backbone
Traceback (most recent call last):
  File "code/demoFio.py", line 28, in <module>
    model.load_state_dict(pretrained_model)
  File "/home/*****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ZSGNet:
    Missing key(s) in state_dict: "backbone.encoder.vgg.0.weight", "backbone.encoder.vgg.0.bias", "backbone.encoder.vgg.2.weight", "backbone.encoder.vgg.2.bias", "backbone.encoder.vgg.5.weight", "backbone.encoder.vgg.5.bias", "backbone.encoder.vgg.7.weight", "backbone.encoder.vgg.7.bias", "backbone.encoder.vgg.10.weight", "backbone.encoder.vgg.10.bias", "backbone.encoder.vgg.12.weight", "backbone.encoder.vgg.12.bias", "backbone.encoder.vgg.14.weight", "backbone.encoder.vgg.14.bias", "backbone.encoder.vgg.17.weight", "backbone.encoder.vgg.17.bias", "backbone.encoder.vgg.19.weight", "backbone.encoder.vgg.19.bias", "backbone.encoder.vgg.21.weight", "backbone.encoder.vgg.21.bias", "backbone.encoder.vgg.24.weight", "backbone.encoder.vgg.24.bias", "backbone.encoder.vgg.26.weight", "backbone.encoder.vgg.26.bias", "backbone.encoder.vgg.28.weight", "backbone.encoder.vgg.28.bias", "backbone.encoder.vgg.31.weight", "backbone.encoder.vgg.31.bias", "backbone.encoder.vgg.33.weight", "backbone.encoder.vgg.33.bias", "backbone.encoder.fproj1.weight", "backbone.encoder.fproj1.bias", "backbone.encoder.fproj2.weight", "backbone.encoder.fproj2.bias", "backbone.encoder.fproj3.weight", "backbone.encoder.fproj3.bias", "backbone.encoder.extras.0.weight", "backbone.encoder.extras.0.bias", "backbone.encoder.extras.1.weight", "backbone.encoder.extras.1.bias", "backbone.encoder.extras.2.weight", "backbone.encoder.extras.2.bias", "backbone.encoder.extras.3.weight", "backbone.encoder.extras.3.bias", "backbone.encoder.extras.4.weight", "backbone.encoder.extras.4.bias", "backbone.encoder.extras.5.weight", "backbone.encoder.extras.5.bias", "backbone.encoder.extras.6.weight", "backbone.encoder.extras.6.bias", "backbone.encoder.extras.7.weight", "backbone.encoder.extras.7.bias", "backbone.encoder.loc.0.weight", "backbone.encoder.loc.0.bias", "backbone.encoder.loc.1.weight", "backbone.encoder.loc.1.bias", "backbone.encoder.loc.2.weight", "backbone.encoder.loc.2.bias", "backbone.encoder.loc.3.weight", "backbone.encoder.loc.3.bias", "backbone.encoder.loc.4.weight", "backbone.encoder.loc.4.bias", "backbone.encoder.loc.5.weight", "backbone.encoder.loc.5.bias", "backbone.encoder.conf.0.weight", "backbone.encoder.conf.0.bias", "backbone.encoder.conf.1.weight", "backbone.encoder.conf.1.bias", "backbone.encoder.conf.2.weight", "backbone.encoder.conf.2.bias", "backbone.encoder.conf.3.weight", "backbone.encoder.conf.3.bias", "backbone.encoder.conf.4.weight", "backbone.encoder.conf.4.bias", "backbone.encoder.conf.5.weight", "backbone.encoder.conf.5.bias", "att_reg_box.0.0.weight", "att_reg_box.0.0.bias", "att_reg_box.1.0.weight", "att_reg_box.1.0.bias", "att_reg_box.2.0.weight", "att_reg_box.2.0.bias", "att_reg_box.3.0.weight", "att_reg_box.3.0.bias", "att_reg_box.4.0.weight", "att_reg_box.4.0.bias", "att_reg_box.5.weight", "att_reg_box.5.bias", "lstm.weight_ih_l0", "lstm.weight_hh_l0", "lstm.bias_ih_l0", "lstm.bias_hh_l0", "lstm.weight_ih_l0_reverse", "lstm.weight_hh_l0_reverse", "lstm.bias_ih_l0_reverse", "lstm.bias_hh_l0_reverse". 
    Unexpected key(s) in state_dict: "model_state_dict", "optimizer_state_dict", "scheduler_state_dict", "num_it", "num_epoch", "cfgtxt", "best_met"

And if I replace it by pretrained_model['model_state_dict']it does not work.

I am sorry to give you that much troubles, but I am genuinely lost.

FioPio commented 2 years ago

I managed to reach this point:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf

import torch

MODEL_PATH = "tmp/models/referit_try.pth"

print('=============================')
print('Loading model')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

But now I am a bit lost, I am trying to load an image let's call it demo.jpg and use a query like the man on the left. Should I use the forward function?

Thank you in advanced

TheShadow29 commented 2 years ago

@FioPio great! You would need to do all the pre-processing for the demo.jpg as in the dataloader.

The easiest way would be to copy-paste this (https://github.com/TheShadow29/zsgnet-pytorch/blob/master/code/dat_loader.py#L98) function but give input the image and the query.

You can then pass this through a collater.

Then you need to do forward pass through the model.

After that pass the output through the evaluator. This would give you the bounding box predictions and probability score. You can then choose the box with the highest probability.

Let me know if it works out.

FioPio commented 2 years ago

I managed to load as you said, but what should I forward to the model? and the evaluator? I have the code as so:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf
import PIL
import spacy
import numpy as np

import torch

def pil2tensor(image, dtype: np.dtype):
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim == 2:
        a = np.expand_dims(a, 2)
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy(a.astype(dtype, copy=False))

def collater(batch):
    qlens = torch.Tensor([i['qlens'] for i in batch])
    max_qlen = int(qlens.max().item())
    out_dict = {}
    for k in batch[0]:
        out_dict[k] = torch.stack([b[k] for b in batch]).float()
    out_dict['qvec'] = out_dict['qvec'][:, :max_qlen]

    return out_dict

MODEL_PATH = 'tmp/models/referit_try.pth'
IMAGE_FILE = 'demo.jpg'
QUERY      = 'The white cup'

print('=============================')
print('Loading models')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

nlp = spacy.load('en_core_web_md')

print('=============================')
print('Preprocessing image')
print('=============================')

phrase_len=50
img = PIL.Image.open(IMAGE_FILE).convert('RGB')
h, w = img.height, img.width
q_chosen = QUERY
q_chosen = q_chosen.strip()
qtmp = nlp(str(q_chosen))
if len(qtmp) == 0:
    raise NotImplementedError
qlen = len(qtmp)
q_chosen = q_chosen + ' PD'*(phrase_len - qlen)
q_chosen_emb = nlp(q_chosen)
if not len(q_chosen_emb) == phrase_len:
    q_chosen_emb = q_chosen_emb[:phrase_len]
q_chosen_emb_vecs = np.array([q.vector for q in q_chosen_emb])
img = img.resize((cfg.resize_img[0], cfg.resize_img[1]))
target = np.array([
    0 / h, 0 / w,
    0 / h, 0 / w
])

img = pil2tensor(img, np.float_).float().div_(255)
out = {
    'img': img,
    'qvec': torch.from_numpy(q_chosen_emb_vecs),
    'qlens': torch.tensor(qlen),
    'img_size': torch.tensor([h, w])
}

col = collater([out])

print('=============================')
print('Predict')
print('=============================')

Thank you in advanced!

TheShadow29 commented 2 years ago

@FioPio

It should be just

evl = Evaluator(ratios, scales, config)
mdl.eval()
mdl_out = model(col)
predictions = evl(mdl_out, col)

pred_boxes = predictions['pred_boxes']
pred_scores = predictions['pred_scores']

Let me know if it works out.

FioPio commented 2 years ago

Nice, now the hole code looks like that:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf
import PIL
import spacy
import numpy as np
from evaluator import Evaluator

import torch

def pil2tensor(image, dtype: np.dtype):
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim == 2:
        a = np.expand_dims(a, 2)
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy(a.astype(dtype, copy=False))

def collater(batch):
    qlens = torch.Tensor([i['qlens'] for i in batch])
    max_qlen = int(qlens.max().item())
    out_dict = {}
    for k in batch[0]:
        out_dict[k] = torch.stack([b[k] for b in batch]).float()
    out_dict['qvec'] = out_dict['qvec'][:, :max_qlen]

    return out_dict

MODEL_PATH = 'tmp/models/referit_try.pth'
IMAGE_FILE = 'demo.jpg'
QUERY      = 'The white cup'

print('=============================')
print('Loading models')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

nlp = spacy.load('en_core_web_md')

print('=============================')
print('Preprocessing image')
print('=============================')

phrase_len=50
img = PIL.Image.open(IMAGE_FILE).convert('RGB')
h, w = img.height, img.width
q_chosen = QUERY
q_chosen = q_chosen.strip()
qtmp = nlp(str(q_chosen))
if len(qtmp) == 0:
    raise NotImplementedError
qlen = len(qtmp)
q_chosen = q_chosen + ' PD'*(phrase_len - qlen)
q_chosen_emb = nlp(q_chosen)
if not len(q_chosen_emb) == phrase_len:
    q_chosen_emb = q_chosen_emb[:phrase_len]
q_chosen_emb_vecs = np.array([q.vector for q in q_chosen_emb])
img = img.resize((cfg.resize_img[0], cfg.resize_img[1]))
target = np.array([
    0 / h, 0 / w,
    0 / h, 0 / w
])

img = pil2tensor(img, np.float_).float().div_(255)
out = {
    'img': img,
    'qvec': torch.from_numpy(q_chosen_emb_vecs),
    'qlens': torch.tensor(qlen),
    'img_size': torch.tensor([h, w])
}

col = collater([out])

print('=============================')
print('Predict')
print('=============================')

if type(cfg['ratios']) != list:
    ratios = eval(cfg['ratios'], {})
else:
    print("f on ratios")

if type(cfg['scales']) != list:
    scales = cfg['scale_factor'] * np.array(eval(cfg['scales'], {}))
else:

    print("f on scales")

evl = Evaluator(ratios, scales, cfg)
model.eval()
mdl_out = model(col)
predictions = evl(mdl_out, col)

pred_boxes = predictions['pred_boxes']
pred_scores = predictions['pred_scores']

But when running it I encounter the following error:

Traceback (most recent call last):
  File "code/demoFio.py", line 107, in <module>
    mdl_out = model(col)
  File "/home/*******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/*******/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/mdl.py", line 360, in forward
    req_emb = self.apply_lstm(req_embs, qlens, max_qlen)
  File "/home/*********/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/mdl.py", line 319, in apply_lstm
    lstm_out1, (self.hidden, _) = self.lstm(packed_embed_inp, self.hidden)
  File "/home/**********/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 557, in forward
    return self.forward_packed(input, hx)
  File "/home/********/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 550, in forward_packed
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 525, in forward_impl
    self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #4 'mat1'

I have found that I should send the data to the gpu as .to(device), but I do not know which are the variables I should send, do you know it?

Thank you

TheShadow29 commented 2 years ago

@FioPio You can just send all of them to gpu. See https://github.com/TheShadow29/zsgnet-pytorch/blob/master/code/utils.py#L405

FioPio commented 2 years ago

Nice, I am getting a new error, but I am getting closer! Now I find

/home/****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py:525: RuntimeWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
  self.num_layers, self.dropout, self.training, self.bidirectional)
Traceback (most recent call last):
  File "code/demoFio.py", line 108, in <module>
    mdl_out = model(col)
  File "/home/*****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/*****/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/mdl.py", line 360, in forward
    req_emb = self.apply_lstm(req_embs, qlens, max_qlen)
  File "/home/*****/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/mdl.py", line 319, in apply_lstm
    lstm_out1, (self.hidden, _) = self.lstm(packed_embed_inp, self.hidden)
  File "/home/****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 557, in forward
    return self.forward_packed(input, hx)
  File "/home/*****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 550, in forward_packed
    output, hidden = self.forward_impl(input, hx, batch_sizes, max_batch_size, sorted_indices)
  File "/home/*******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 525, in forward_impl
    self.num_layers, self.dropout, self.training, self.bidirectional)
RuntimeError: Input and parameter tensors are not at the same device, found input tensor at cuda:0 and parameter tensor at cpu

So I just need to pass the parameter tensor to cpu, which I do not know what it is, so I am doing a little of research, but if you have any clue it would help me a lot.

TheShadow29 commented 2 years ago

@FioPio I believe you just need to do mdl.to(torch.device('cuda:0')) in this case.

FioPio commented 2 years ago

Thank you! It worked, but now I am having a last problem:

Traceback (most recent call last):
  File "code/demoFio.py", line 120, in <module>
    predictions = evl(mdl_out, col)
  File "/home/*****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/*****/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/evaluator.py", line 82, in forward
    anchs, reg_box)
  File "/home/*****/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/anchors.py", line 191, in reg_params_to_bbox
    a111 = anc1[..., 2:] * b1 + anc1[..., :2]
RuntimeError: The size of tensor a (17460) must match the size of tensor b (1940) at non-singleton dimension 

The code looks like that:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf
import PIL
import spacy
import numpy as np
from evaluator import Evaluator
import mdl

import torch

def pil2tensor(image, dtype: np.dtype):
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim == 2:
        a = np.expand_dims(a, 2)
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy(a.astype(dtype, copy=False))

def collater(batch):
    qlens = torch.Tensor([i['qlens'] for i in batch])
    max_qlen = int(qlens.max().item())
    out_dict = {}
    for k in batch[0]:
        out_dict[k] = torch.stack([b[k] for b in batch]).float()
    out_dict['qvec'] = out_dict['qvec'][:, :max_qlen]

    return out_dict

MODEL_PATH = 'tmp/models/referit_try.pth'
IMAGE_FILE = 'demo.jpg'
QUERY      = 'The white cup'

print('=============================')
print('Loading models')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4
device = torch.device(cfg.device)

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
state_dict = pretrained_model['model_state_dict']
#for k in state_dict.keys():
#    state_dict[k] = state_dict[k].to(device)

#model.load_state_dict(state_dict, strict=False)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

nlp = spacy.load('en_core_web_md') #.to(device)

print('=============================')
print('Preprocessing image')
print('=============================')

phrase_len=50
img = PIL.Image.open(IMAGE_FILE).convert('RGB')
h, w = img.height, img.width
q_chosen = QUERY
q_chosen = q_chosen.strip()
qtmp = nlp(str(q_chosen))
if len(qtmp) == 0:
    raise NotImplementedError
qlen = len(qtmp)
q_chosen = q_chosen + ' PD'*(phrase_len - qlen)
q_chosen_emb = nlp(q_chosen)
if not len(q_chosen_emb) == phrase_len:
    q_chosen_emb = q_chosen_emb[:phrase_len]
q_chosen_emb_vecs = np.array([q.vector for q in q_chosen_emb])
img = img.resize((cfg.resize_img[0], cfg.resize_img[1]))
target = np.array([
    0 / h, 0 / w,
    0 / h, 0 / w
])

img = pil2tensor(img, np.float_).float().div_(255)

target = np.array([ 0, 0, 0, 0])
out = {
    'img': img,
    'qvec': torch.from_numpy(q_chosen_emb_vecs),
    'qlens': torch.tensor(qlen),
    'annot': torch.from_numpy(target).float(),
    'img_size': torch.tensor([h, w])
}

col = collater([out])

print('=============================')
print('Prediction')
print('=============================')

ratios = eval(cfg['ratios'], {})
scales = cfg['scale_factor'] * np.array(eval(cfg['scales'], {}))

evl = Evaluator(ratios, scales, cfg)

#####################################
#model = model.to(device=torch.device('cpu'))#device)
#####################################

model.to(device)

model.eval()
for c in col.keys():
    col[c] = col[c].to(device)
mdl_out = model(col)
#model = model.to(device=torch.device('cpu'))
predictions = evl(mdl_out, col)

pred_boxes = predictions['pred_boxes']
pred_scores = predictions['pred_scores']
TheShadow29 commented 2 years ago

@FioPio This is a bit strange.

Could you confirm the sizes of anc1 and b1? That is the input just before this function: https://github.com/TheShadow29/zsgnet-pytorch/blob/b9881d4372a63bace72420105de3d7e01f91fe47/code/evaluator.py#L81

anc1 should be the anchors from the images; and b1 the box parameters. The function is stops at would convert the anchors the regression parameters to actual bounding box predictions.

FioPio commented 2 years ago
-> Size of anchs:
17460
-> Size of reg_box:
1

Is that what you were looking for? I post my code just in case:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf
import PIL
import spacy
import numpy as np
from evaluator import Evaluator
import mdl

import torch

def pil2tensor(image, dtype: np.dtype):
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim == 2:
        a = np.expand_dims(a, 2)
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy(a.astype(dtype, copy=False))

def collater(batch):
    qlens = torch.Tensor([i['qlens'] for i in batch])
    max_qlen = int(qlens.max().item())
    out_dict = {}
    for k in batch[0]:
        out_dict[k] = torch.stack([b[k] for b in batch]).float()
    out_dict['qvec'] = out_dict['qvec'][:, :max_qlen]

    return out_dict

MODEL_PATH = 'tmp/models/referit_try.pth'
IMAGE_FILE = 'demo.jpg'
QUERY      = 'The white cup'

print('=============================')
print('Loading models')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4
device = torch.device(cfg.device)

model = get_default_net(cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
state_dict = pretrained_model['model_state_dict']
#for k in state_dict.keys():
#    state_dict[k] = state_dict[k].to(device)

#model.load_state_dict(state_dict, strict=False)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

nlp = spacy.load('en_core_web_md') #.to(device)

print('=============================')
print('Preprocessing image')
print('=============================')

phrase_len=50
img = PIL.Image.open(IMAGE_FILE).convert('RGB')
h, w = img.height, img.width
q_chosen = QUERY
q_chosen = q_chosen.strip()
qtmp = nlp(str(q_chosen))
if len(qtmp) == 0:
    raise NotImplementedError
qlen = len(qtmp)
q_chosen = q_chosen + ' PD'*(phrase_len - qlen)
q_chosen_emb = nlp(q_chosen)
if not len(q_chosen_emb) == phrase_len:
    q_chosen_emb = q_chosen_emb[:phrase_len]
q_chosen_emb_vecs = np.array([q.vector for q in q_chosen_emb])
img = img.resize((cfg.resize_img[0], cfg.resize_img[1]))
target = np.array([
    0 / h, 0 / w,
    0 / h, 0 / w
])

img = pil2tensor(img, np.float_).float().div_(255)

target = np.array([ 0, 0, 0, 0])
out = {
    'img': img,
    'qvec': torch.from_numpy(q_chosen_emb_vecs),
    'qlens': torch.tensor(qlen),
    'annot': torch.from_numpy(target).float(),
    'img_size': torch.tensor([h, w])
}

col = collater([out])

print('=============================')
print('Prediction')
print('=============================')

ratios = eval(cfg['ratios'], {})
scales = cfg['scale_factor'] * np.array(eval(cfg['scales'], {}))

evl = Evaluator(ratios, scales, cfg)

#####################################
#model = model.to(device=torch.device('cpu'))#device)
#####################################

model.to(device)

model.eval()
for c in col.keys():
    col[c] = col[c].to(device)
mdl_out = model(col)
#model = model.to(device=torch.device('cpu'))
predictions = evl(mdl_out, col)

pred_boxes = predictions['pred_boxes']
pred_scores = predictions['pred_scores']
TheShadow29 commented 2 years ago

@FioPio Sorry for not being clear. I meant anchs.shape and reg_box.shape.

I am not completely sure, but it is possible that .squeeze function is being somewhere which is causing some issues. Can you try by processing two images at once? Just replace

# col = collater([out]) 
col = collater([out, out])
FioPio commented 2 years ago

@TheShadow29 Do not worry, I applied both changes (printing the shape of both elements and trying the list of two outs) , so I can provide more information. I get the following info/error:

-> Size of anchs:
torch.Size([17460, 4])
-> Size of reg_box:
torch.Size([2, 1940, 4])
Traceback (most recent call last):
  File "code/demoFio.py", line 121, in <module>
    predictions = evl(mdl_out, col)
  File "/home/******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/******/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/evaluator.py", line 91, in forward
    anchs, reg_box)
  File "/home/******/Documents/UMU/semestre4TFM/code/demos/LG/condaTest/zsgnet-pytorch/code/anchors.py", line 191, in reg_params_to_bbox
    a111 = anc1[..., 2:] * b1 + anc1[..., :2]
RuntimeError: The size of tensor a (17460) must match the size of tensor b (1940) at non-singleton dimension 1
TheShadow29 commented 2 years ago

@FioPio I see the error now. You would need to provide the correct number of anchors to the model. Currently, model is assuming only 1 output per anchor, but it should be 9 anchors per box.

Just replace

ratios = ...
scales = ...
n_anchors = len(ratios) * len(scales)
mdl = get_default_net(num_anchors=num_anchors, cfg=cfg)

See this part for reference: https://github.com/TheShadow29/zsgnet-pytorch/blob/b9881d4372a63bace72420105de3d7e01f91fe47/code/main_dist.py#L34

Let me know if it works out.

FioPio commented 2 years ago

Thank you @TheShadow29, it worked! the problem now is that it gives me a very low confidence (actually always the same, 0.018146815) and does not recognize well any object, I wonder if I should use another model, I am using the one I trained with the referit database.

The final code i am using is:

#!/usr/bin/python3
# -*- coding: utf-8 -*-

from mdl import get_default_net
from extended_config import cfg as conf
import PIL
import spacy
import numpy as np
from evaluator import Evaluator
import mdl

import torch

def pil2tensor(image, dtype: np.dtype):
    "Convert PIL style `image` array to torch style image tensor."
    a = np.asarray(image)
    if a.ndim == 2:
        a = np.expand_dims(a, 2)
    a = np.transpose(a, (1, 0, 2))
    a = np.transpose(a, (2, 1, 0))
    return torch.from_numpy(a.astype(dtype, copy=False))

def collater(batch):
    qlens = torch.Tensor([i['qlens'] for i in batch])
    max_qlen = int(qlens.max().item())
    out_dict = {}
    for k in batch[0]:
        out_dict[k] = torch.stack([b[k] for b in batch]).float()
    out_dict['qvec'] = out_dict['qvec'][:, :max_qlen]

    return out_dict

MODEL_PATH = 'tmp/models/referit_try.pth'
IMAGE_FILE = 'demo.jpg'
QUERY      = 'The knife' #white cup'

print('=============================')
print('Loading models')
print('=============================')

cfg = conf
cfg.mdl_to_use = 'ssd_vgg_t'
cfg.ds_to_use = 'refclef'
cfg.num_gpus = 1
cfg.bs=16
cfg.nw=4
device = torch.device(cfg.device)
ratios = eval(cfg['ratios'], {})
scales = cfg['scale_factor'] * np.array(eval(cfg['scales'], {}))

n_anchors = len(ratios) * len(scales)

model = get_default_net(num_anchors=n_anchors,cfg=cfg)
pretrained_model = torch.load(MODEL_PATH)
state_dict = pretrained_model['model_state_dict']
#for k in state_dict.keys():
#    state_dict[k] = state_dict[k].to(device)

#model.load_state_dict(state_dict, strict=False)
model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

model.phase='test'

nlp = spacy.load('en_core_web_md') #.to(device)

print('=============================')
print('Preprocessing image')
print('=============================')

phrase_len=50
img = PIL.Image.open(IMAGE_FILE).convert('RGB')
h, w = img.height, img.width
q_chosen = QUERY
q_chosen = q_chosen.strip()
qtmp = nlp(str(q_chosen))
if len(qtmp) == 0:
    raise NotImplementedError
qlen = len(qtmp)
q_chosen = q_chosen + ' PD'*(phrase_len - qlen)
q_chosen_emb = nlp(q_chosen)
if not len(q_chosen_emb) == phrase_len:
    q_chosen_emb = q_chosen_emb[:phrase_len]
q_chosen_emb_vecs = np.array([q.vector for q in q_chosen_emb])
img = img.resize((cfg.resize_img[0], cfg.resize_img[1]))
target = np.array([
    0 / h, 0 / w,
    0 / h, 0 / w
])

img = pil2tensor(img, np.float_).float().div_(255)

target = np.array([ 0, 0, 0, 0])
out = {
    'img': img,
    'qvec': torch.from_numpy(q_chosen_emb_vecs),
    'qlens': torch.tensor(qlen),
    'annot': torch.from_numpy(target).float(),
    'img_size': torch.tensor([h, w]),
    'idxs': torch.tensor(qlen),
}

#col = collater([out])
col = collater([out])

print('=============================')
print('Prediction')
print('=============================')

evl = Evaluator(ratios, scales, cfg)

#####################################
#model = model.to(device=torch.device('cpu'))#device)
#####################################

model.to(device)

model.eval()
for c in col.keys():
    col[c] = col[c].to(device)
mdl_out = model(col)
#model = model.to(device=torch.device('cpu'))
predictions = evl(mdl_out, col)

pred_boxes = predictions['pred_boxes']
pred_scores = predictions['pred_scores']

import cv2

img = cv2.imread(IMAGE_FILE)

box = pred_boxes.data.cpu().numpy()[0]
score = pred_scores.data.cpu().numpy()[0]
print(box)
print(score)

cv2.rectangle(img, pt1=(int(box[0]),int(box[1])), pt2=(int(box[2]),int(box[3])), color=(255,0,0), thickness=10)

# Resizing image so it fits on the screen
scale_percent = 30 # percent of original size
width = int(img.shape[1] * scale_percent / 100)
height = int(img.shape[0] * scale_percent / 100)
dim = (width, height)

# resize image
resized = cv2.resize(img, dim, interpolation = cv2.INTER_AREA)

cv2.imshow("result", resized)
cv2.waitKey(0)
TheShadow29 commented 2 years ago

@FioPio can you do a sanity check:

  1. change the query input to random tensor
  2. change the image input to random tensor

Just to double check if the prob remains the same.

Also, may changing the size to 600 x 600 would be useful.

FioPio commented 2 years ago

Hey, changing the size to 600 x 600 keeps doing the same, how do I set it to a random tensor?

Thank you in advance.

TheShadow29 commented 2 years ago

@FioPio

I meant that you could just set them to random tensors.

out = {
'img': torch.rand(*img.shape), # for random image
'qvec': torch.rand(*qvec.shape) # for random text.
... 
}
FioPio commented 2 years ago

It gives the same output, could you check my code just in case? To see if there is any other issue with it.

TheShadow29 commented 2 years ago

@FioPio I see you have used model.load_state_dict(pretrained_model['model_state_dict'], strict=False).

I think it should be strict=True. Are there any variables that are not matching?

FioPio commented 2 years ago

@TheShadow29 Yes, If I set that to True then I get this error:

Traceback (most recent call last):
  File "code/demoFio.py", line 62, in <module>
    model.load_state_dict(pretrained_model['model_state_dict'], strict=True)
  File "/home/*****/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ZSGNet:
    Missing key(s) in state_dict: "backbone.encoder.vgg.0.weight", "backbone.encoder.vgg.0.bias", "backbone.encoder.vgg.2.weight", "backbone.encoder.vgg.2.bias", "backbone.encoder.vgg.5.weight", "backbone.encoder.vgg.5.bias", "backbone.encoder.vgg.7.weight", "backbone.encoder.vgg.7.bias", "backbone.encoder.vgg.10.weight", "backbone.encoder.vgg.10.bias", "backbone.encoder.vgg.12.weight", "backbone.encoder.vgg.12.bias", "backbone.encoder.vgg.14.weight", "backbone.encoder.vgg.14.bias", "backbone.encoder.vgg.17.weight", "backbone.encoder.vgg.17.bias", "backbone.encoder.vgg.19.weight", "backbone.encoder.vgg.19.bias", "backbone.encoder.vgg.21.weight", "backbone.encoder.vgg.21.bias", "backbone.encoder.vgg.24.weight", "backbone.encoder.vgg.24.bias", "backbone.encoder.vgg.26.weight", "backbone.encoder.vgg.26.bias", "backbone.encoder.vgg.28.weight", "backbone.encoder.vgg.28.bias", "backbone.encoder.vgg.31.weight", "backbone.encoder.vgg.31.bias", "backbone.encoder.vgg.33.weight", "backbone.encoder.vgg.33.bias", "backbone.encoder.fproj1.weight", "backbone.encoder.fproj1.bias", "backbone.encoder.fproj2.weight", "backbone.encoder.fproj2.bias", "backbone.encoder.fproj3.weight", "backbone.encoder.fproj3.bias", "backbone.encoder.extras.0.weight", "backbone.encoder.extras.0.bias", "backbone.encoder.extras.1.weight", "backbone.encoder.extras.1.bias", "backbone.encoder.extras.2.weight", "backbone.encoder.extras.2.bias", "backbone.encoder.extras.3.weight", "backbone.encoder.extras.3.bias", "backbone.encoder.extras.4.weight", "backbone.encoder.extras.4.bias", "backbone.encoder.extras.5.weight", "backbone.encoder.extras.5.bias", "backbone.encoder.extras.6.weight", "backbone.encoder.extras.6.bias", "backbone.encoder.extras.7.weight", "backbone.encoder.extras.7.bias", "backbone.encoder.loc.0.weight", "backbone.encoder.loc.0.bias", "backbone.encoder.loc.1.weight", "backbone.encoder.loc.1.bias", "backbone.encoder.loc.2.weight", "backbone.encoder.loc.2.bias", "backbone.encoder.loc.3.weight", "backbone.encoder.loc.3.bias", "backbone.encoder.loc.4.weight", "backbone.encoder.loc.4.bias", "backbone.encoder.loc.5.weight", "backbone.encoder.loc.5.bias", "backbone.encoder.conf.0.weight", "backbone.encoder.conf.0.bias", "backbone.encoder.conf.1.weight", "backbone.encoder.conf.1.bias", "backbone.encoder.conf.2.weight", "backbone.encoder.conf.2.bias", "backbone.encoder.conf.3.weight", "backbone.encoder.conf.3.bias", "backbone.encoder.conf.4.weight", "backbone.encoder.conf.4.bias", "backbone.encoder.conf.5.weight", "backbone.encoder.conf.5.bias", "att_reg_box.0.0.weight", "att_reg_box.0.0.bias", "att_reg_box.1.0.weight", "att_reg_box.1.0.bias", "att_reg_box.2.0.weight", "att_reg_box.2.0.bias", "att_reg_box.3.0.weight", "att_reg_box.3.0.bias", "att_reg_box.4.0.weight", "att_reg_box.4.0.bias", "att_reg_box.5.weight", "att_reg_box.5.bias", "lstm.weight_ih_l0", "lstm.weight_hh_l0", "lstm.bias_ih_l0", "lstm.bias_hh_l0", "lstm.weight_ih_l0_reverse", "lstm.weight_hh_l0_reverse", "lstm.bias_ih_l0_reverse", "lstm.bias_hh_l0_reverse". 
    Unexpected key(s) in state_dict: "module.backbone.encoder.conv1.weight", "module.backbone.encoder.bn1.weight", "module.backbone.encoder.bn1.bias", "module.backbone.encoder.bn1.running_mean", "module.backbone.encoder.bn1.running_var", "module.backbone.encoder.bn1.num_batches_tracked", "module.backbone.encoder.layer1.0.conv1.weight", "module.backbone.encoder.layer1.0.bn1.weight", "module.backbone.encoder.layer1.0.bn1.bias", "module.backbone.encoder.layer1.0.bn1.running_mean", "module.backbone.encoder.layer1.0.bn1.running_var", "module.backbone.encoder.layer1.0.bn1.num_batches_tracked", "module.backbone.encoder.layer1.0.conv2.weight", "module.backbone.encoder.layer1.0.bn2.weight", "module.backbone.encoder.layer1.0.bn2.bias", "module.backbone.encoder.layer1.0.bn2.running_mean", "module.backbone.encoder.layer1.0.bn2.running_var", "module.backbone.encoder.layer1.0.bn2.num_batches_tracked", "module.backbone.encoder.layer1.0.conv3.weight", "module.backbone.encoder.layer1.0.bn3.weight", "module.backbone.encoder.layer1.0.bn3.bias", "module.backbone.encoder.layer1.0.bn3.running_mean", "module.backbone.encoder.layer1.0.bn3.running_var", "module.backbone.encoder.layer1.0.bn3.num_batches_tracked", "module.backbone.encoder.layer1.0.downsample.0.weight", "module.backbone.encoder.layer1.0.downsample.1.weight", "module.backbone.encoder.layer1.0.downsample.1.bias", "module.backbone.encoder.layer1.0.downsample.1.running_mean", "module.backbone.encoder.layer1.0.downsample.1.running_var", "module.backbone.encoder.layer1.0.downsample.1.num_batches_tracked", "module.backbone.encoder.layer1.1.conv1.weight", "module.backbone.encoder.layer1.1.bn1.weight", "module.backbone.encoder.layer1.1.bn1.bias", "module.backbone.encoder.layer1.1.bn1.running_mean", "module.backbone.encoder.layer1.1.bn1.running_var", "module.backbone.encoder.layer1.1.bn1.num_batches_tracked", "module.backbone.encoder.layer1.1.conv2.weight", "module.backbone.encoder.layer1.1.bn2.weight", "module.backbone.encoder.layer1.1.bn2.bias", "module.backbone.encoder.layer1.1.bn2.running_mean", "module.backbone.encoder.layer1.1.bn2.running_var", "module.backbone.encoder.layer1.1.bn2.num_batches_tracked", "module.backbone.encoder.layer1.1.conv3.weight", "module.backbone.encoder.layer1.1.bn3.weight", "module.backbone.encoder.layer1.1.bn3.bias", "module.backbone.encoder.layer1.1.bn3.running_mean", "module.backbone.encoder.layer1.1.bn3.running_var", "module.backbone.encoder.layer1.1.bn3.num_batches_tracked", "module.backbone.encoder.layer1.2.conv1.weight", "module.backbone.encoder.layer1.2.bn1.weight", "module.backbone.encoder.layer1.2.bn1.bias", "module.backbone.encoder.layer1.2.bn1.running_mean", "module.backbone.encoder.layer1.2.bn1.running_var", "module.backbone.encoder.layer1.2.bn1.num_batches_tracked", "module.backbone.encoder.layer1.2.conv2.weight", "module.backbone.encoder.layer1.2.bn2.weight", "module.backbone.encoder.layer1.2.bn2.bias", "module.backbone.encoder.layer1.2.bn2.running_mean", "module.backbone.encoder.layer1.2.bn2.running_var", "module.backbone.encoder.layer1.2.bn2.num_batches_tracked", "module.backbone.encoder.layer1.2.conv3.weight", "module.backbone.encoder.layer1.2.bn3.weight", "module.backbone.encoder.layer1.2.bn3.bias", "module.backbone.encoder.layer1.2.bn3.running_mean", "module.backbone.encoder.layer1.2.bn3.running_var", "module.backbone.encoder.layer1.2.bn3.num_batches_tracked", "module.backbone.encoder.layer2.0.conv1.weight", "module.backbone.encoder.layer2.0.bn1.weight", "module.backbone.encoder.layer2.0.bn1.bias", "module.backbone.encoder.layer2.0.bn1.running_mean", "module.backbone.encoder.layer2.0.bn1.running_var", "module.backbone.encoder.layer2.0.bn1.num_batches_tracked", "module.backbone.encoder.layer2.0.conv2.weight", "module.backbone.encoder.layer2.0.bn2.weight", "module.backbone.encoder.layer2.0.bn2.bias", "module.backbone.encoder.layer2.0.bn2.running_mean", "module.backbone.encoder.layer2.0.bn2.running_var", "module.backbone.encoder.layer2.0.bn2.num_batches_tracked", "module.backbone.encoder.layer2.0.conv3.weight", "module.backbone.encoder.layer2.0.bn3.weight", "module.backbone.encoder.layer2.0.bn3.bias", "module.backbone.encoder.layer2.0.bn3.running_mean", "module.backbone.encoder.layer2.0.bn3.running_var", "module.backbone.encoder.layer2.0.bn3.num_batches_tracked", "module.backbone.encoder.layer2.0.downsample.0.weight", "module.backbone.encoder.layer2.0.downsample.1.weight", "module.backbone.encoder.layer2.0.downsample.1.bias", "module.backbone.encoder.layer2.0.downsample.1.running_mean", "module.backbone.encoder.layer2.0.downsample.1.running_var", "module.backbone.encoder.layer2.0.downsample.1.num_batches_tracked", "module.backbone.encoder.layer2.1.conv1.weight", "module.backbone.encoder.layer2.1.bn1.weight", "module.backbone.encoder.layer2.1.bn1.bias", "module.backbone.encoder.layer2.1.bn1.running_mean", "module.backbone.encoder.layer2.1.bn1.running_var", "module.backbone.encoder.layer2.1.bn1.num_batches_tracked", "module.backbone.encoder.layer2.1.conv2.weight", "module.backbone.encoder.layer2.1.bn2.weight", "module.backbone.encoder.layer2.1.bn2.bias", "module.backbone.encoder.layer2.1.bn2.running_mean", "module.backbone.encoder.layer2.1.bn2.running_var", "module.backbone.encoder.layer2.1.bn2.num_batches_tracked", "module.backbone.encoder.layer2.1.conv3.weight", "module.backbone.encoder.layer2.1.bn3.weight", "module.backbone.encoder.layer2.1.bn3.bias", "module.backbone.encoder.layer2.1.bn3.running_mean", "module.backbone.encoder.layer2.1.bn3.running_var", "module.backbone.encoder.layer2.1.bn3.num_batches_tracked", "module.backbone.encoder.layer2.2.conv1.weight", "module.backbone.encoder.layer2.2.bn1.weight", "module.backbone.encoder.layer2.2.bn1.bias", "module.backbone.encoder.layer2.2.bn1.running_mean", "module.backbone.encoder.layer2.2.bn1.running_var", "module.backbone.encoder.layer2.2.bn1.num_batches_tracked", "module.backbone.encoder.layer2.2.conv2.weight", "module.backbone.encoder.layer2.2.bn2.weight", "module.backbone.encoder.layer2.2.bn2.bias", "module.backbone.encoder.layer2.2.bn2.running_mean", "module.backbone.encoder.layer2.2.bn2.running_var", "module.backbone.encoder.layer2.2.bn2.num_batches_tracked", "module.backbone.encoder.layer2.2.conv3.weight", "module.backbone.encoder.layer2.2.bn3.weight", "module.backbone.encoder.layer2.2.bn3.bias", "module.backbone.encoder.layer2.2.bn3.running_mean", "module.backbone.encoder.layer2.2.bn3.running_var", "module.backbone.encoder.layer2.2.bn3.num_batches_tracked", "module.backbone.encoder.layer2.3.conv1.weight", "module.backbone.encoder.layer2.3.bn1.weight", "module.backbone.encoder.layer2.3.bn1.bias", "module.backbone.encoder.layer2.3.bn1.running_mean", "module.backbone.encoder.layer2.3.bn1.running_var", "module.backbone.encoder.layer2.3.bn1.num_batches_tracked", "module.backbone.encoder.layer2.3.conv2.weight", "module.backbone.encoder.layer2.3.bn2.weight", "module.backbone.encoder.layer2.3.bn2.bias", "module.backbone.encoder.layer2.3.bn2.running_mean", "module.backbone.encoder.layer2.3.bn2.running_var", "module.backbone.encoder.layer2.3.bn2.num_batches_tracked", "module.backbone.encoder.layer2.3.conv3.weight", "module.backbone.encoder.layer2.3.bn3.weight", "module.backbone.encoder.layer2.3.bn3.bias", "module.backbone.encoder.layer2.3.bn3.running_mean", "module.backbone.encoder.layer2.3.bn3.running_var", "module.backbone.encoder.layer2.3.bn3.num_batches_tracked", "module.backbone.encoder.layer3.0.conv1.weight", "module.backbone.encoder.layer3.0.bn1.weight", "module.backbone.encoder.layer3.0.bn1.bias", "module.backbone.encoder.layer3.0.bn1.running_mean", "module.backbone.encoder.layer3.0.bn1.running_var", "module.backbone.encoder.layer3.0.bn1.num_batches_tracked", "module.backbone.encoder.layer3.0.conv2.weight", "module.backbone.encoder.layer3.0.bn2.weight", "module.backbone.encoder.layer3.0.bn2.bias", "module.backbone.encoder.layer3.0.bn2.running_mean", "module.backbone.encoder.layer3.0.bn2.running_var", "module.backbone.encoder.layer3.0.bn2.num_batches_tracked", "module.backbone.encoder.layer3.0.conv3.weight", "module.backbone.encoder.layer3.0.bn3.weight", "module.backbone.encoder.layer3.0.bn3.bias", "module.backbone.encoder.layer3.0.bn3.running_mean", "module.backbone.encoder.layer3.0.bn3.running_var", "module.backbone.encoder.layer3.0.bn3.num_batches_tracked", "module.backbone.encoder.layer3.0.downsample.0.weight", "module.backbone.encoder.layer3.0.downsample.1.weight", "module.backbone.encoder.layer3.0.downsample.1.bias", "module.backbone.encoder.layer3.0.downsample.1.running_mean", "module.backbone.encoder.layer3.0.downsample.1.running_var", "module.backbone.encoder.layer3.0.downsample.1.num_batches_tracked", "module.backbone.encoder.layer3.1.conv1.weight", "module.backbone.encoder.layer3.1.bn1.weight", "module.backbone.encoder.layer3.1.bn1.bias", "module.backbone.encoder.layer3.1.bn1.running_mean", "module.backbone.encoder.layer3.1.bn1.running_var", "module.backbone.encoder.layer3.1.bn1.num_batches_tracked", "module.backbone.encoder.layer3.1.conv2.weight", "module.backbone.encoder.layer3.1.bn2.weight", "module.backbone.encoder.layer3.1.bn2.bias", "module.backbone.encoder.layer3.1.bn2.running_mean", "module.backbone.encoder.layer3.1.bn2.running_var", "module.backbone.encoder.layer3.1.bn2.num_batches_tracked", "module.backbone.encoder.layer3.1.conv3.weight", "module.backbone.encoder.layer3.1.bn3.weight", "module.backbone.encoder.layer3.1.bn3.bias", "module.backbone.encoder.layer3.1.bn3.running_mean", "module.backbone.encoder.layer3.1.bn3.running_var", "module.backbone.encoder.layer3.1.bn3.num_batches_tracked", "module.backbone.encoder.layer3.2.conv1.weight", "module.backbone.encoder.layer3.2.bn1.weight", "module.backbone.encoder.layer3.2.bn1.bias", "module.backbone.encoder.layer3.2.bn1.running_mean", "module.backbone.encoder.layer3.2.bn1.running_var", "module.backbone.encoder.layer3.2.bn1.num_batches_tracked", "module.backbone.encoder.layer3.2.conv2.weight", "module.backbone.encoder.layer3.2.bn2.weight", "module.backbone.encoder.layer3.2.bn2.bias", "module.backbone.encoder.layer3.2.bn2.running_mean", "module.backbone.encoder.layer3.2.bn2.running_var", "module.backbone.encoder.layer3.2.bn2.num_batches_tracked", "module.backbone.encoder.layer3.2.conv3.weight", "module.backbone.encoder.layer3.2.bn3.weight", "module.backbone.encoder.layer3.2.bn3.bias", "module.backbone.encoder.layer3.2.bn3.running_mean", "module.backbone.encoder.layer3.2.bn3.running_var", "module.backbone.encoder.layer3.2.bn3.num_batches_tracked", "module.backbone.encoder.layer3.3.conv1.weight", "module.backbone.encoder.layer3.3.bn1.weight", "module.backbone.encoder.layer3.3.bn1.bias", "module.backbone.encoder.layer3.3.bn1.running_mean", "module.backbone.encoder.layer3.3.bn1.running_var", "module.backbone.encoder.layer3.3.bn1.num_batches_tracked", "module.backbone.encoder.layer3.3.conv2.weight", "module.backbone.encoder.layer3.3.bn2.weight", "module.backbone.encoder.layer3.3.bn2.bias", "module.backbone.encoder.layer3.3.bn2.running_mean", "module.backbone.encoder.layer3.3.bn2.running_var", "module.backbone.encoder.layer3.3.bn2.num_batches_tracked", "module.backbone.encoder.layer3.3.conv3.weight", "module.backbone.encoder.layer3.3.bn3.weight", "module.backbone.encoder.layer3.3.bn3.bias", "module.backbone.encoder.layer3.3.bn3.running_mean", "module.backbone.encoder.layer3.3.bn3.running_var", "module.backbone.encoder.layer3.3.bn3.num_batches_tracked", "module.backbone.encoder.layer3.4.conv1.weight", "module.backbone.encoder.layer3.4.bn1.weight", "module.backbone.encoder.layer3.4.bn1.bias", "module.backbone.encoder.layer3.4.bn1.running_mean", "module.backbone.encoder.layer3.4.bn1.running_var", "module.backbone.encoder.layer3.4.bn1.num_batches_tracked", "module.backbone.encoder.layer3.4.conv2.weight", "module.backbone.encoder.layer3.4.bn2.weight", "module.backbone.encoder.layer3.4.bn2.bias", "module.backbone.encoder.layer3.4.bn2.running_mean", "module.backbone.encoder.layer3.4.bn2.running_var", "module.backbone.encoder.layer3.4.bn2.num_batches_tracked", "module.backbone.encoder.layer3.4.conv3.weight", "module.backbone.encoder.layer3.4.bn3.weight", "module.backbone.encoder.layer3.4.bn3.bias", "module.backbone.encoder.layer3.4.bn3.running_mean", "module.backbone.encoder.layer3.4.bn3.running_var", "module.backbone.encoder.layer3.4.bn3.num_batches_tracked", "module.backbone.encoder.layer3.5.conv1.weight", "module.backbone.encoder.layer3.5.bn1.weight", "module.backbone.encoder.layer3.5.bn1.bias", "module.backbone.encoder.layer3.5.bn1.running_mean", "module.backbone.encoder.layer3.5.bn1.running_var", "module.backbone.encoder.layer3.5.bn1.num_batches_tracked", "module.backbone.encoder.layer3.5.conv2.weight", "module.backbone.encoder.layer3.5.bn2.weight", "module.backbone.encoder.layer3.5.bn2.bias", "module.backbone.encoder.layer3.5.bn2.running_mean", "module.backbone.encoder.layer3.5.bn2.running_var", "module.backbone.encoder.layer3.5.bn2.num_batches_tracked", "module.backbone.encoder.layer3.5.conv3.weight", "module.backbone.encoder.layer3.5.bn3.weight", "module.backbone.encoder.layer3.5.bn3.bias", "module.backbone.encoder.layer3.5.bn3.running_mean", "module.backbone.encoder.layer3.5.bn3.running_var", "module.backbone.encoder.layer3.5.bn3.num_batches_tracked", "module.backbone.encoder.layer4.0.conv1.weight", "module.backbone.encoder.layer4.0.bn1.weight", "module.backbone.encoder.layer4.0.bn1.bias", "module.backbone.encoder.layer4.0.bn1.running_mean", "module.backbone.encoder.layer4.0.bn1.running_var", "module.backbone.encoder.layer4.0.bn1.num_batches_tracked", "module.backbone.encoder.layer4.0.conv2.weight", "module.backbone.encoder.layer4.0.bn2.weight", "module.backbone.encoder.layer4.0.bn2.bias", "module.backbone.encoder.layer4.0.bn2.running_mean", "module.backbone.encoder.layer4.0.bn2.running_var", "module.backbone.encoder.layer4.0.bn2.num_batches_tracked", "module.backbone.encoder.layer4.0.conv3.weight", "module.backbone.encoder.layer4.0.bn3.weight", "module.backbone.encoder.layer4.0.bn3.bias", "module.backbone.encoder.layer4.0.bn3.running_mean", "module.backbone.encoder.layer4.0.bn3.running_var", "module.backbone.encoder.layer4.0.bn3.num_batches_tracked", "module.backbone.encoder.layer4.0.downsample.0.weight", "module.backbone.encoder.layer4.0.downsample.1.weight", "module.backbone.encoder.layer4.0.downsample.1.bias", "module.backbone.encoder.layer4.0.downsample.1.running_mean", "module.backbone.encoder.layer4.0.downsample.1.running_var", "module.backbone.encoder.layer4.0.downsample.1.num_batches_tracked", "module.backbone.encoder.layer4.1.conv1.weight", "module.backbone.encoder.layer4.1.bn1.weight", "module.backbone.encoder.layer4.1.bn1.bias", "module.backbone.encoder.layer4.1.bn1.running_mean", "module.backbone.encoder.layer4.1.bn1.running_var", "module.backbone.encoder.layer4.1.bn1.num_batches_tracked", "module.backbone.encoder.layer4.1.conv2.weight", "module.backbone.encoder.layer4.1.bn2.weight", "module.backbone.encoder.layer4.1.bn2.bias", "module.backbone.encoder.layer4.1.bn2.running_mean", "module.backbone.encoder.layer4.1.bn2.running_var", "module.backbone.encoder.layer4.1.bn2.num_batches_tracked", "module.backbone.encoder.layer4.1.conv3.weight", "module.backbone.encoder.layer4.1.bn3.weight", "module.backbone.encoder.layer4.1.bn3.bias", "module.backbone.encoder.layer4.1.bn3.running_mean", "module.backbone.encoder.layer4.1.bn3.running_var", "module.backbone.encoder.layer4.1.bn3.num_batches_tracked", "module.backbone.encoder.layer4.2.conv1.weight", "module.backbone.encoder.layer4.2.bn1.weight", "module.backbone.encoder.layer4.2.bn1.bias", "module.backbone.encoder.layer4.2.bn1.running_mean", "module.backbone.encoder.layer4.2.bn1.running_var", "module.backbone.encoder.layer4.2.bn1.num_batches_tracked", "module.backbone.encoder.layer4.2.conv2.weight", "module.backbone.encoder.layer4.2.bn2.weight", "module.backbone.encoder.layer4.2.bn2.bias", "module.backbone.encoder.layer4.2.bn2.running_mean", "module.backbone.encoder.layer4.2.bn2.running_var", "module.backbone.encoder.layer4.2.bn2.num_batches_tracked", "module.backbone.encoder.layer4.2.conv3.weight", "module.backbone.encoder.layer4.2.bn3.weight", "module.backbone.encoder.layer4.2.bn3.bias", "module.backbone.encoder.layer4.2.bn3.running_mean", "module.backbone.encoder.layer4.2.bn3.running_var", "module.backbone.encoder.layer4.2.bn3.num_batches_tracked", "module.backbone.encoder.fc.weight", "module.backbone.encoder.fc.bias", "module.backbone.fpn.P7_2.weight", "module.backbone.fpn.P7_2.bias", "module.backbone.fpn.P6.weight", "module.backbone.fpn.P6.bias", "module.backbone.fpn.P5_1.weight", "module.backbone.fpn.P5_1.bias", "module.backbone.fpn.P5_2.weight", "module.backbone.fpn.P5_2.bias", "module.backbone.fpn.P4_1.weight", "module.backbone.fpn.P4_1.bias", "module.backbone.fpn.P4_2.weight", "module.backbone.fpn.P4_2.bias", "module.backbone.fpn.P3_1.weight", "module.backbone.fpn.P3_1.bias", "module.backbone.fpn.P3_2.weight", "module.backbone.fpn.P3_2.bias", "module.att_reg_box.0.0.weight", "module.att_reg_box.0.0.bias", "module.att_reg_box.1.0.weight", "module.att_reg_box.1.0.bias", "module.att_reg_box.2.0.weight", "module.att_reg_box.2.0.bias", "module.att_reg_box.3.0.weight", "module.att_reg_box.3.0.bias", "module.att_reg_box.4.0.weight", "module.att_reg_box.4.0.bias", "module.att_reg_box.5.weight", "module.att_reg_box.5.bias", "module.lstm.weight_ih_l0", "module.lstm.weight_hh_l0", "module.lstm.bias_ih_l0", "module.lstm.bias_hh_l0", "module.lstm.weight_ih_l0_reverse", "module.lstm.weight_hh_l0_reverse", "module.lstm.bias_ih_l0_reverse", "module.lstm.bias_hh_l0_reverse".
TheShadow29 commented 2 years ago

@FioPio It is because the model was trained for multiple gpus, but your code is for a single GPU.

Simply replace:

model.load_state_dict(pretrained_model['model_state_dict'], strict=False)

with the following:

loaded_state_dict = pretrained_model['model_state_dict']
loaded_state_dict2 = {k.split('module.',1)[1]: v for k,v in loaded_state_dict.items()}

model.load_state_dict(loaded_state_dict2, strict=True)
FioPio commented 2 years ago

@TheShadow29 Trying that I get:

Traceback (most recent call last):
  File "code/demoFio.py", line 61, in <module>
    model.load_state_dict(loaded_state_dict2, strict=True)
  File "/home/******/anaconda3/envs/zsgnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ZSGNet:
    Missing key(s) in state_dict: "backbone.encoder.vgg.0.weight", "backbone.encoder.vgg.0.bias", "backbone.encoder.vgg.2.weight", "backbone.encoder.vgg.2.bias", "backbone.encoder.vgg.5.weight", "backbone.encoder.vgg.5.bias", "backbone.encoder.vgg.7.weight", "backbone.encoder.vgg.7.bias", "backbone.encoder.vgg.10.weight", "backbone.encoder.vgg.10.bias", "backbone.encoder.vgg.12.weight", "backbone.encoder.vgg.12.bias", "backbone.encoder.vgg.14.weight", "backbone.encoder.vgg.14.bias", "backbone.encoder.vgg.17.weight", "backbone.encoder.vgg.17.bias", "backbone.encoder.vgg.19.weight", "backbone.encoder.vgg.19.bias", "backbone.encoder.vgg.21.weight", "backbone.encoder.vgg.21.bias", "backbone.encoder.vgg.24.weight", "backbone.encoder.vgg.24.bias", "backbone.encoder.vgg.26.weight", "backbone.encoder.vgg.26.bias", "backbone.encoder.vgg.28.weight", "backbone.encoder.vgg.28.bias", "backbone.encoder.vgg.31.weight", "backbone.encoder.vgg.31.bias", "backbone.encoder.vgg.33.weight", "backbone.encoder.vgg.33.bias", "backbone.encoder.fproj1.weight", "backbone.encoder.fproj1.bias", "backbone.encoder.fproj2.weight", "backbone.encoder.fproj2.bias", "backbone.encoder.fproj3.weight", "backbone.encoder.fproj3.bias", "backbone.encoder.extras.0.weight", "backbone.encoder.extras.0.bias", "backbone.encoder.extras.1.weight", "backbone.encoder.extras.1.bias", "backbone.encoder.extras.2.weight", "backbone.encoder.extras.2.bias", "backbone.encoder.extras.3.weight", "backbone.encoder.extras.3.bias", "backbone.encoder.extras.4.weight", "backbone.encoder.extras.4.bias", "backbone.encoder.extras.5.weight", "backbone.encoder.extras.5.bias", "backbone.encoder.extras.6.weight", "backbone.encoder.extras.6.bias", "backbone.encoder.extras.7.weight", "backbone.encoder.extras.7.bias", "backbone.encoder.loc.0.weight", "backbone.encoder.loc.0.bias", "backbone.encoder.loc.1.weight", "backbone.encoder.loc.1.bias", "backbone.encoder.loc.2.weight", "backbone.encoder.loc.2.bias", "backbone.encoder.loc.3.weight", "backbone.encoder.loc.3.bias", "backbone.encoder.loc.4.weight", "backbone.encoder.loc.4.bias", "backbone.encoder.loc.5.weight", "backbone.encoder.loc.5.bias", "backbone.encoder.conf.0.weight", "backbone.encoder.conf.0.bias", "backbone.encoder.conf.1.weight", "backbone.encoder.conf.1.bias", "backbone.encoder.conf.2.weight", "backbone.encoder.conf.2.bias", "backbone.encoder.conf.3.weight", "backbone.encoder.conf.3.bias", "backbone.encoder.conf.4.weight", "backbone.encoder.conf.4.bias", "backbone.encoder.conf.5.weight", "backbone.encoder.conf.5.bias". 
    Unexpected key(s) in state_dict: "backbone.fpn.P7_2.weight", "backbone.fpn.P7_2.bias", "backbone.fpn.P6.weight", "backbone.fpn.P6.bias", "backbone.fpn.P5_1.weight", "backbone.fpn.P5_1.bias", "backbone.fpn.P5_2.weight", "backbone.fpn.P5_2.bias", "backbone.fpn.P4_1.weight", "backbone.fpn.P4_1.bias", "backbone.fpn.P4_2.weight", "backbone.fpn.P4_2.bias", "backbone.fpn.P3_1.weight", "backbone.fpn.P3_1.bias", "backbone.fpn.P3_2.weight", "backbone.fpn.P3_2.bias", "backbone.encoder.conv1.weight", "backbone.encoder.bn1.weight", "backbone.encoder.bn1.bias", "backbone.encoder.bn1.running_mean", "backbone.encoder.bn1.running_var", "backbone.encoder.bn1.num_batches_tracked", "backbone.encoder.layer1.0.conv1.weight", "backbone.encoder.layer1.0.bn1.weight", "backbone.encoder.layer1.0.bn1.bias", "backbone.encoder.layer1.0.bn1.running_mean", "backbone.encoder.layer1.0.bn1.running_var", "backbone.encoder.layer1.0.bn1.num_batches_tracked", "backbone.encoder.layer1.0.conv2.weight", "backbone.encoder.layer1.0.bn2.weight", "backbone.encoder.layer1.0.bn2.bias", "backbone.encoder.layer1.0.bn2.running_mean", "backbone.encoder.layer1.0.bn2.running_var", "backbone.encoder.layer1.0.bn2.num_batches_tracked", "backbone.encoder.layer1.0.conv3.weight", "backbone.encoder.layer1.0.bn3.weight", "backbone.encoder.layer1.0.bn3.bias", "backbone.encoder.layer1.0.bn3.running_mean", "backbone.encoder.layer1.0.bn3.running_var", "backbone.encoder.layer1.0.bn3.num_batches_tracked", "backbone.encoder.layer1.0.downsample.0.weight", "backbone.encoder.layer1.0.downsample.1.weight", "backbone.encoder.layer1.0.downsample.1.bias", "backbone.encoder.layer1.0.downsample.1.running_mean", "backbone.encoder.layer1.0.downsample.1.running_var", "backbone.encoder.layer1.0.downsample.1.num_batches_tracked", "backbone.encoder.layer1.1.conv1.weight", "backbone.encoder.layer1.1.bn1.weight", "backbone.encoder.layer1.1.bn1.bias", "backbone.encoder.layer1.1.bn1.running_mean", "backbone.encoder.layer1.1.bn1.running_var", "backbone.encoder.layer1.1.bn1.num_batches_tracked", "backbone.encoder.layer1.1.conv2.weight", "backbone.encoder.layer1.1.bn2.weight", "backbone.encoder.layer1.1.bn2.bias", "backbone.encoder.layer1.1.bn2.running_mean", "backbone.encoder.layer1.1.bn2.running_var", "backbone.encoder.layer1.1.bn2.num_batches_tracked", "backbone.encoder.layer1.1.conv3.weight", "backbone.encoder.layer1.1.bn3.weight", "backbone.encoder.layer1.1.bn3.bias", "backbone.encoder.layer1.1.bn3.running_mean", "backbone.encoder.layer1.1.bn3.running_var", "backbone.encoder.layer1.1.bn3.num_batches_tracked", "backbone.encoder.layer1.2.conv1.weight", "backbone.encoder.layer1.2.bn1.weight", "backbone.encoder.layer1.2.bn1.bias", "backbone.encoder.layer1.2.bn1.running_mean", "backbone.encoder.layer1.2.bn1.running_var", "backbone.encoder.layer1.2.bn1.num_batches_tracked", "backbone.encoder.layer1.2.conv2.weight", "backbone.encoder.layer1.2.bn2.weight", "backbone.encoder.layer1.2.bn2.bias", "backbone.encoder.layer1.2.bn2.running_mean", "backbone.encoder.layer1.2.bn2.running_var", "backbone.encoder.layer1.2.bn2.num_batches_tracked", "backbone.encoder.layer1.2.conv3.weight", "backbone.encoder.layer1.2.bn3.weight", "backbone.encoder.layer1.2.bn3.bias", "backbone.encoder.layer1.2.bn3.running_mean", "backbone.encoder.layer1.2.bn3.running_var", "backbone.encoder.layer1.2.bn3.num_batches_tracked", "backbone.encoder.layer2.0.conv1.weight", "backbone.encoder.layer2.0.bn1.weight", "backbone.encoder.layer2.0.bn1.bias", "backbone.encoder.layer2.0.bn1.running_mean", "backbone.encoder.layer2.0.bn1.running_var", "backbone.encoder.layer2.0.bn1.num_batches_tracked", "backbone.encoder.layer2.0.conv2.weight", "backbone.encoder.layer2.0.bn2.weight", "backbone.encoder.layer2.0.bn2.bias", "backbone.encoder.layer2.0.bn2.running_mean", "backbone.encoder.layer2.0.bn2.running_var", "backbone.encoder.layer2.0.bn2.num_batches_tracked", "backbone.encoder.layer2.0.conv3.weight", "backbone.encoder.layer2.0.bn3.weight", "backbone.encoder.layer2.0.bn3.bias", "backbone.encoder.layer2.0.bn3.running_mean", "backbone.encoder.layer2.0.bn3.running_var", "backbone.encoder.layer2.0.bn3.num_batches_tracked", "backbone.encoder.layer2.0.downsample.0.weight", "backbone.encoder.layer2.0.downsample.1.weight", "backbone.encoder.layer2.0.downsample.1.bias", "backbone.encoder.layer2.0.downsample.1.running_mean", "backbone.encoder.layer2.0.downsample.1.running_var", "backbone.encoder.layer2.0.downsample.1.num_batches_tracked", "backbone.encoder.layer2.1.conv1.weight", "backbone.encoder.layer2.1.bn1.weight", "backbone.encoder.layer2.1.bn1.bias", "backbone.encoder.layer2.1.bn1.running_mean", "backbone.encoder.layer2.1.bn1.running_var", "backbone.encoder.layer2.1.bn1.num_batches_tracked", "backbone.encoder.layer2.1.conv2.weight", "backbone.encoder.layer2.1.bn2.weight", "backbone.encoder.layer2.1.bn2.bias", "backbone.encoder.layer2.1.bn2.running_mean", "backbone.encoder.layer2.1.bn2.running_var", "backbone.encoder.layer2.1.bn2.num_batches_tracked", "backbone.encoder.layer2.1.conv3.weight", "backbone.encoder.layer2.1.bn3.weight", "backbone.encoder.layer2.1.bn3.bias", "backbone.encoder.layer2.1.bn3.running_mean", "backbone.encoder.layer2.1.bn3.running_var", "backbone.encoder.layer2.1.bn3.num_batches_tracked", "backbone.encoder.layer2.2.conv1.weight", "backbone.encoder.layer2.2.bn1.weight", "backbone.encoder.layer2.2.bn1.bias", "backbone.encoder.layer2.2.bn1.running_mean", "backbone.encoder.layer2.2.bn1.running_var", "backbone.encoder.layer2.2.bn1.num_batches_tracked", "backbone.encoder.layer2.2.conv2.weight", "backbone.encoder.layer2.2.bn2.weight", "backbone.encoder.layer2.2.bn2.bias", "backbone.encoder.layer2.2.bn2.running_mean", "backbone.encoder.layer2.2.bn2.running_var", "backbone.encoder.layer2.2.bn2.num_batches_tracked", "backbone.encoder.layer2.2.conv3.weight", "backbone.encoder.layer2.2.bn3.weight", "backbone.encoder.layer2.2.bn3.bias", "backbone.encoder.layer2.2.bn3.running_mean", "backbone.encoder.layer2.2.bn3.running_var", "backbone.encoder.layer2.2.bn3.num_batches_tracked", "backbone.encoder.layer2.3.conv1.weight", "backbone.encoder.layer2.3.bn1.weight", "backbone.encoder.layer2.3.bn1.bias", "backbone.encoder.layer2.3.bn1.running_mean", "backbone.encoder.layer2.3.bn1.running_var", "backbone.encoder.layer2.3.bn1.num_batches_tracked", "backbone.encoder.layer2.3.conv2.weight", "backbone.encoder.layer2.3.bn2.weight", "backbone.encoder.layer2.3.bn2.bias", "backbone.encoder.layer2.3.bn2.running_mean", "backbone.encoder.layer2.3.bn2.running_var", "backbone.encoder.layer2.3.bn2.num_batches_tracked", "backbone.encoder.layer2.3.conv3.weight", "backbone.encoder.layer2.3.bn3.weight", "backbone.encoder.layer2.3.bn3.bias", "backbone.encoder.layer2.3.bn3.running_mean", "backbone.encoder.layer2.3.bn3.running_var", "backbone.encoder.layer2.3.bn3.num_batches_tracked", "backbone.encoder.layer3.0.conv1.weight", "backbone.encoder.layer3.0.bn1.weight", "backbone.encoder.layer3.0.bn1.bias", "backbone.encoder.layer3.0.bn1.running_mean", "backbone.encoder.layer3.0.bn1.running_var", "backbone.encoder.layer3.0.bn1.num_batches_tracked", "backbone.encoder.layer3.0.conv2.weight", "backbone.encoder.layer3.0.bn2.weight", "backbone.encoder.layer3.0.bn2.bias", "backbone.encoder.layer3.0.bn2.running_mean", "backbone.encoder.layer3.0.bn2.running_var", "backbone.encoder.layer3.0.bn2.num_batches_tracked", "backbone.encoder.layer3.0.conv3.weight", "backbone.encoder.layer3.0.bn3.weight", "backbone.encoder.layer3.0.bn3.bias", "backbone.encoder.layer3.0.bn3.running_mean", "backbone.encoder.layer3.0.bn3.running_var", "backbone.encoder.layer3.0.bn3.num_batches_tracked", "backbone.encoder.layer3.0.downsample.0.weight", "backbone.encoder.layer3.0.downsample.1.weight", "backbone.encoder.layer3.0.downsample.1.bias", "backbone.encoder.layer3.0.downsample.1.running_mean", "backbone.encoder.layer3.0.downsample.1.running_var", "backbone.encoder.layer3.0.downsample.1.num_batches_tracked", "backbone.encoder.layer3.1.conv1.weight", "backbone.encoder.layer3.1.bn1.weight", "backbone.encoder.layer3.1.bn1.bias", "backbone.encoder.layer3.1.bn1.running_mean", "backbone.encoder.layer3.1.bn1.running_var", "backbone.encoder.layer3.1.bn1.num_batches_tracked", "backbone.encoder.layer3.1.conv2.weight", "backbone.encoder.layer3.1.bn2.weight", "backbone.encoder.layer3.1.bn2.bias", "backbone.encoder.layer3.1.bn2.running_mean", "backbone.encoder.layer3.1.bn2.running_var", "backbone.encoder.layer3.1.bn2.num_batches_tracked", "backbone.encoder.layer3.1.conv3.weight", "backbone.encoder.layer3.1.bn3.weight", "backbone.encoder.layer3.1.bn3.bias", "backbone.encoder.layer3.1.bn3.running_mean", "backbone.encoder.layer3.1.bn3.running_var", "backbone.encoder.layer3.1.bn3.num_batches_tracked", "backbone.encoder.layer3.2.conv1.weight", "backbone.encoder.layer3.2.bn1.weight", "backbone.encoder.layer3.2.bn1.bias", "backbone.encoder.layer3.2.bn1.running_mean", "backbone.encoder.layer3.2.bn1.running_var", "backbone.encoder.layer3.2.bn1.num_batches_tracked", "backbone.encoder.layer3.2.conv2.weight", "backbone.encoder.layer3.2.bn2.weight", "backbone.encoder.layer3.2.bn2.bias", "backbone.encoder.layer3.2.bn2.running_mean", "backbone.encoder.layer3.2.bn2.running_var", "backbone.encoder.layer3.2.bn2.num_batches_tracked", "backbone.encoder.layer3.2.conv3.weight", "backbone.encoder.layer3.2.bn3.weight", "backbone.encoder.layer3.2.bn3.bias", "backbone.encoder.layer3.2.bn3.running_mean", "backbone.encoder.layer3.2.bn3.running_var", "backbone.encoder.layer3.2.bn3.num_batches_tracked", "backbone.encoder.layer3.3.conv1.weight", "backbone.encoder.layer3.3.bn1.weight", "backbone.encoder.layer3.3.bn1.bias", "backbone.encoder.layer3.3.bn1.running_mean", "backbone.encoder.layer3.3.bn1.running_var", "backbone.encoder.layer3.3.bn1.num_batches_tracked", "backbone.encoder.layer3.3.conv2.weight", "backbone.encoder.layer3.3.bn2.weight", "backbone.encoder.layer3.3.bn2.bias", "backbone.encoder.layer3.3.bn2.running_mean", "backbone.encoder.layer3.3.bn2.running_var", "backbone.encoder.layer3.3.bn2.num_batches_tracked", "backbone.encoder.layer3.3.conv3.weight", "backbone.encoder.layer3.3.bn3.weight", "backbone.encoder.layer3.3.bn3.bias", "backbone.encoder.layer3.3.bn3.running_mean", "backbone.encoder.layer3.3.bn3.running_var", "backbone.encoder.layer3.3.bn3.num_batches_tracked", "backbone.encoder.layer3.4.conv1.weight", "backbone.encoder.layer3.4.bn1.weight", "backbone.encoder.layer3.4.bn1.bias", "backbone.encoder.layer3.4.bn1.running_mean", "backbone.encoder.layer3.4.bn1.running_var", "backbone.encoder.layer3.4.bn1.num_batches_tracked", "backbone.encoder.layer3.4.conv2.weight", "backbone.encoder.layer3.4.bn2.weight", "backbone.encoder.layer3.4.bn2.bias", "backbone.encoder.layer3.4.bn2.running_mean", "backbone.encoder.layer3.4.bn2.running_var", "backbone.encoder.layer3.4.bn2.num_batches_tracked", "backbone.encoder.layer3.4.conv3.weight", "backbone.encoder.layer3.4.bn3.weight", "backbone.encoder.layer3.4.bn3.bias", "backbone.encoder.layer3.4.bn3.running_mean", "backbone.encoder.layer3.4.bn3.running_var", "backbone.encoder.layer3.4.bn3.num_batches_tracked", "backbone.encoder.layer3.5.conv1.weight", "backbone.encoder.layer3.5.bn1.weight", "backbone.encoder.layer3.5.bn1.bias", "backbone.encoder.layer3.5.bn1.running_mean", "backbone.encoder.layer3.5.bn1.running_var", "backbone.encoder.layer3.5.bn1.num_batches_tracked", "backbone.encoder.layer3.5.conv2.weight", "backbone.encoder.layer3.5.bn2.weight", "backbone.encoder.layer3.5.bn2.bias", "backbone.encoder.layer3.5.bn2.running_mean", "backbone.encoder.layer3.5.bn2.running_var", "backbone.encoder.layer3.5.bn2.num_batches_tracked", "backbone.encoder.layer3.5.conv3.weight", "backbone.encoder.layer3.5.bn3.weight", "backbone.encoder.layer3.5.bn3.bias", "backbone.encoder.layer3.5.bn3.running_mean", "backbone.encoder.layer3.5.bn3.running_var", "backbone.encoder.layer3.5.bn3.num_batches_tracked", "backbone.encoder.layer4.0.conv1.weight", "backbone.encoder.layer4.0.bn1.weight", "backbone.encoder.layer4.0.bn1.bias", "backbone.encoder.layer4.0.bn1.running_mean", "backbone.encoder.layer4.0.bn1.running_var", "backbone.encoder.layer4.0.bn1.num_batches_tracked", "backbone.encoder.layer4.0.conv2.weight", "backbone.encoder.layer4.0.bn2.weight", "backbone.encoder.layer4.0.bn2.bias", "backbone.encoder.layer4.0.bn2.running_mean", "backbone.encoder.layer4.0.bn2.running_var", "backbone.encoder.layer4.0.bn2.num_batches_tracked", "backbone.encoder.layer4.0.conv3.weight", "backbone.encoder.layer4.0.bn3.weight", "backbone.encoder.layer4.0.bn3.bias", "backbone.encoder.layer4.0.bn3.running_mean", "backbone.encoder.layer4.0.bn3.running_var", "backbone.encoder.layer4.0.bn3.num_batches_tracked", "backbone.encoder.layer4.0.downsample.0.weight", "backbone.encoder.layer4.0.downsample.1.weight", "backbone.encoder.layer4.0.downsample.1.bias", "backbone.encoder.layer4.0.downsample.1.running_mean", "backbone.encoder.layer4.0.downsample.1.running_var", "backbone.encoder.layer4.0.downsample.1.num_batches_tracked", "backbone.encoder.layer4.1.conv1.weight", "backbone.encoder.layer4.1.bn1.weight", "backbone.encoder.layer4.1.bn1.bias", "backbone.encoder.layer4.1.bn1.running_mean", "backbone.encoder.layer4.1.bn1.running_var", "backbone.encoder.layer4.1.bn1.num_batches_tracked", "backbone.encoder.layer4.1.conv2.weight", "backbone.encoder.layer4.1.bn2.weight", "backbone.encoder.layer4.1.bn2.bias", "backbone.encoder.layer4.1.bn2.running_mean", "backbone.encoder.layer4.1.bn2.running_var", "backbone.encoder.layer4.1.bn2.num_batches_tracked", "backbone.encoder.layer4.1.conv3.weight", "backbone.encoder.layer4.1.bn3.weight", "backbone.encoder.layer4.1.bn3.bias", "backbone.encoder.layer4.1.bn3.running_mean", "backbone.encoder.layer4.1.bn3.running_var", "backbone.encoder.layer4.1.bn3.num_batches_tracked", "backbone.encoder.layer4.2.conv1.weight", "backbone.encoder.layer4.2.bn1.weight", "backbone.encoder.layer4.2.bn1.bias", "backbone.encoder.layer4.2.bn1.running_mean", "backbone.encoder.layer4.2.bn1.running_var", "backbone.encoder.layer4.2.bn1.num_batches_tracked", "backbone.encoder.layer4.2.conv2.weight", "backbone.encoder.layer4.2.bn2.weight", "backbone.encoder.layer4.2.bn2.bias", "backbone.encoder.layer4.2.bn2.running_mean", "backbone.encoder.layer4.2.bn2.running_var", "backbone.encoder.layer4.2.bn2.num_batches_tracked", "backbone.encoder.layer4.2.conv3.weight", "backbone.encoder.layer4.2.bn3.weight", "backbone.encoder.layer4.2.bn3.bias", "backbone.encoder.layer4.2.bn3.running_mean", "backbone.encoder.layer4.2.bn3.running_var", "backbone.encoder.layer4.2.bn3.num_batches_tracked", "backbone.encoder.fc.weight", "backbone.encoder.fc.bias". 

We are getting closer though!

TheShadow29 commented 2 years ago

@FioPio Can you change the cfg.mdl_to_use line before calling get_default_net. It is being used here: https://github.com/TheShadow29/zsgnet-pytorch/tree/master/code#L410

cfg.mdl_to_use = 'retina'
FioPio commented 2 years ago

Thank you, it is working! I set this issue as closed!

Thank you!

TheShadow29 commented 2 years ago

Awesome! Feel free to re-open the issue if there is any trouble