PyTorch implementation of the paper "Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval", CVPR 2019.
Problems about the model



yoooo233 commented 5 years ago

Thanks for your code,but there are 2 problems I don't konw how to solve when using is.

1,When I try to use the given pretrained model for test after downloading them by the bash command,I got such error tips. What shoud I do if I want to use the pretrained model.

  File "src/", line 322, in <module>
  File "src/", line 196, in main
  File "/home/USR/anaconda3/envs/torch11/lib/python3.6/site-packages/torch/nn/modules/", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SEM_PCYC:
        size mismatch for classifier_sk.weight: copying a param with shape torch.Size([220, 512]) from checkpoint, the shape in current model is torch.Size([27, 512]).
        size mismatch for classifier_im.weight: copying a param with shape torch.Size([220, 512]) from checkpoint, the shape in current model is torch.Size([27, 512]).
        size mismatch for classifier_se.weight: copying a param with shape torch.Size([220, 64]) from checkpoint, the shape in current model is torch.Size([27, 64]).

2,When I try to train the model using the default datasets(and the default config except the batchsize from 128 to 64 because I'm using a 2070s) the calculated "mAP" between 2 epochs is always 1.000 . So the model saver will not update the model because the ```if map > best_map:```is false(actually it only saves the model from epoch 1).I would appreciate it if you can point out what to do to make it calculate the mAP correctly besides changing the" >" to ">=".

What's more,the mAP and other metrics seems normal using the with newly trained model.

[Train] Epoch: [99][100/100]    Time 0.363 (0.372)      Gen. Loss 144.5337 (143.8287)   Disc. Loss 0.2694 (0.2813)      
[Test][Sketch] Epoch: [99][1/2] Time 0.733 (0.733)      
[Test][Sketch] Epoch: [99][2/2] Time 0.044 (0.388)      
[Test][Image] Epoch: [99][1/2]  Time 0.421 (0.399)      
[Test][Image] Epoch: [99][2/2]  Time 0.089 (0.322)      
Computing evaluation metrics...Done
mAP@all on validation set after 99 epochs: 1.0000 (real), 1.0000 (binary)
[Train] Epoch: [100][1/100]     Time 1.295 (1.295)      Gen. Loss 135.4901 (135.4901)   Disc. Loss 0.2673 (0.2673)     
AnjanDutta commented 5 years ago

Why do you have size torch.Size([27, 512])? It should be torch.Size([220, 512]). You have problem in dataset. You have very few examples that can fit within only two batches, that's very strange.

yoooo233 commented 5 years ago

I'm also puzzled why this happens. Can you tell me which function are setting or calculating the result "220"/"27"? I didn't find such default value in the model.

I find the value of "27" may be calculated by the parameter 'num_clss' which is calculated as follows in line 154

 # Number of classes
    params_model['num_clss'] = len(dict_clss)

Then I checked how the dict_clss is defined and add a print after the dict_clss is done to show what it is,in line 109,and the print is added after the original line 128

    if args.gzs_sbir > 0:
        perc = 0.2
        _, idx_sk = np.unique(splits['tr_fls_sk'], return_index=True)
        tr_fls_sk_ = splits['tr_fls_sk'][idx_sk]
        tr_clss_sk_ = splits['tr_clss_sk'][idx_sk]
        _, idx_im = np.unique(splits['tr_fls_im'], return_index=True)
        tr_fls_im_ = splits['tr_fls_im'][idx_im]
        tr_clss_im_ = splits['tr_clss_im'][idx_im]
        if args.dataset == 'Sketchy' and args.filter_sketch:
            _, idx_sk = np.unique([f.split('-')[0] for f in tr_fls_sk_], return_index=True)
            tr_fls_sk_ = tr_fls_sk_[idx_sk]
            tr_clss_sk_ = tr_clss_sk_[idx_sk]
        idx_sk = np.sort(np.random.choice(tr_fls_sk_.shape[0], int(perc * splits['te_fls_sk'].shape[0]), replace=False))
        idx_im = np.sort(np.random.choice(tr_fls_im_.shape[0], int(perc * splits['te_fls_im'].shape[0]), replace=False))
        splits['te_fls_sk'] = np.concatenate((tr_fls_sk_[idx_sk], splits['te_fls_sk']), axis=0)
        splits['te_clss_sk'] = np.concatenate((tr_clss_sk_[idx_sk], splits['te_clss_sk']), axis=0)
        splits['te_fls_im'] = np.concatenate((tr_fls_im_[idx_im], splits['te_fls_im']), axis=0)
        splits['te_clss_im'] = np.concatenate((tr_clss_im_[idx_im], splits['te_clss_im']), axis=0)

    # class dictionary
    dict_clss = utils.create_dict_texts(splits['tr_clss_im'])
    #added print to find out what is dict_clss
    print ("dict_clss is ",(dict_clss))

The output of the print is:

Loading data...dict_clss is  {'bench': 0, 'cigarette': 1, 'diamond': 2, 'door_handle': 3, 'ear': 4, 'eye': 5, 'face': 6, 'feather': 7, 'fire_hydrant': 8, 'flower_with_stem': 9, 'flying_saucer': 10, 'hand': 11, 'human_skeleton': 12, 'moon': 13, 'mouth': 14, 'nose': 15, 'person_sitting': 16, 'person_walking': 17, 'power_outlet': 18, 'present': 19, 'santa_claus': 20, 'skull': 21, 'snowman': 22, 'sponge_bob': 23, 'sun': 24, 'teddy_bear': 25, 'tooth': 26}

There happens to be 27 class names here so I just guess the difference may be caused by this. I didn't find any file like this and these class names are part of the TU-Berlin dataset.And I'm checking codes in like the function load_files_tuberlin_zeroshot which includes 'tr_clss_im' to find out why I got a strange 27. What's more ,when I sucessfully run the using my own trained model ,the out put of the print is the same including 27 classes.

AnjanDutta commented 5 years ago

Can you please check how many classes are there in your TU-Berlin dataset folder?

yoooo233 commented 5 years ago


AnjanDutta commented 5 years ago

There should be exactly 250 classes in the TU-Berlin folder. I suggest you to run the code by not setting the gzs-sbir flag. There should be exactly 220 classes in the dict_clss dictionary.

yoooo233 commented 5 years ago

I'sorry ,I take the list.txt into account,actually there is 250 classes.

yoooo233 commented 5 years ago

Thanks for your tips guiding me to check the dataset again.As for the function load_files_tuberlin_zeroshot in, when it is loading dataset from the images,it only load files with .jpg . But the image part used in that project actually comes from the Extended_TU-Burlin and there are some files in some(most) classes are with an extension. JPEG so they won't be loaded(altough they are the same thing). I check the number of classses loaded both of sketch and image and my strange 27 can be interpreted as 31*0.88 .

def load_files_tuberlin_zeroshot(root_path, photo_dir='images', sketch_dir='sketches', photo_sd='', sketch_sd=''):

    path_im = os.path.join(root_path, photo_dir, photo_sd)
    path_sk = os.path.join(root_path, sketch_dir, sketch_sd)
    print("path_sk is",path_sk)

    # image files and classes
    fls_im = glob.glob(os.path.join(path_im, '*', '*.jpg'))
    fls_im = np.array([os.path.join(f.split('/')[-2], f.split('/')[-1]) for f in fls_im])
    clss_im = np.array([f.split('/')[-2] for f in fls_im])
    #sketch files and classes
    fls_sk = glob.glob(os.path.join(path_sk, '*', '*.png'))
    fls_sk = np.array([os.path.join(f.split('/')[-2], f.split('/')[-1]) for f in fls_sk])
    clss_sk = np.array([f.split('/')[-2] for f in fls_sk])

    # all the unique classes
    classes = np.unique(clss_im)
    classes_sk = np.unique(clss_sk)
    print("num_clss_sk--", len(classes_sk))
    #print("num_clss_im--",len(classes),"classes_im are as:",classes)
    #print("num_clss_sk--",len(classes_sk),"classes_sk are as:", classes_sk)
    # divide the classes, done according to the "Zero-Shot Sketch-Image Hashing" paper
    tr_classes = np.random.choice(classes, int(0.88 * len(classes)), replace=False)
    va_classes = np.random.choice(np.setdiff1d(classes, tr_classes), int(0.06 * len(classes)), replace=False)
    te_classes = np.setdiff1d(classes, np.union1d(tr_classes, va_classes))
Loading data...path_sk is /home/USR/Desktop/codes/Semantical_ZSIR/sem-pcyc-master/dataset/TU-Berlin/sketches/
num_clss_im-- 31
num_clss_sk-- 250
dict_clss is  {'bench': 0, 'cigarette': 1, 'diamond': 2, 'door_handle': 3, 'ear': 4, 'eye': 5, 'face': 6, 'feather': 7, 'fire_hydrant': 8, 'flower_with_stem': 9, 'flying_saucer': 10, 'hand': 11, 'human_skeleton': 12, 'moon': 13, 'mouth': 14, 'nose': 15, 'person_sitting': 16, 'person_walking': 17, 'power_outlet': 18, 'present': 19, 'santa_claus': 20, 'skull': 21, 'snowman': 22, 'sponge_bob': 23, 'sun': 24, 'teddy_bear': 25, 'tooth': 26}
yoooo233 commented 5 years ago

Okay after transforming all image from .JEPG to .jpg , the pretrained model can be used and the mAP between two training epochs can bee calculated correctly now.

AnjanDutta commented 5 years ago

If you had used my script, the renaming from .JPEG to .jpg is done in line 167 of that file. For using the semantic information, we need to do some renaming of the TU-Berlin dataset classes. So make sure you run this script, otherwise you can get some bugs in the future. I am closing this issue.