AnjanDutta / sem-pcyc

PyTorch implementation of the paper "Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval", CVPR 2019.
MIT License
110 stars 24 forks source link

Problems about the model #11

Closed yoooo233 closed 5 years ago

yoooo233 commented 5 years ago

Thanks for your code,but there are 2 problems I don't konw how to solve when using is.

1,When I try to use the given pretrained model for test after downloading them by the bash command,I got such error tips. What shoud I do if I want to use the pretrained model.

  File "src/test.py", line 322, in <module>
    main()
  File "src/test.py", line 196, in main
    sem_pcyc_model.load_state_dict(checkpoint['state_dict'])
  File "/home/USR/anaconda3/envs/torch11/lib/python3.6/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SEM_PCYC:
        size mismatch for classifier_sk.weight: copying a param with shape torch.Size([220, 512]) from checkpoint, the shape in current model is torch.Size([27, 512]).
        size mismatch for classifier_im.weight: copying a param with shape torch.Size([220, 512]) from checkpoint, the shape in current model is torch.Size([27, 512]).
        size mismatch for classifier_se.weight: copying a param with shape torch.Size([220, 64]) from checkpoint, the shape in current model is torch.Size([27, 64]).

2,When I try to train the model using the default datasets(and the default config except the batchsize from 128 to 64 because I'm using a 2070s) the calculated "mAP" between 2 epochs is always 1.000 . So the model saver will not update the model because the ```if map > best_map:```is false(actually it only saves the model from epoch 1).I would appreciate it if you can point out what to do to make it calculate the mAP correctly besides changing the" >" to ">=".

What's more,the mAP and other metrics seems normal using the test.py with newly trained model.


[Train] Epoch: [99][100/100]    Time 0.363 (0.372)      Gen. Loss 144.5337 (143.8287)   Disc. Loss 0.2694 (0.2813)      
***Validation***
[Test][Sketch] Epoch: [99][1/2] Time 0.733 (0.733)      
[Test][Sketch] Epoch: [99][2/2] Time 0.044 (0.388)      
[Test][Image] Epoch: [99][1/2]  Time 0.421 (0.399)      
[Test][Image] Epoch: [99][2/2]  Time 0.089 (0.322)      
Computing evaluation metrics...Done
mAP@all on validation set after 99 epochs: 1.0000 (real), 1.0000 (binary)
[Train] Epoch: [100][1/100]     Time 1.295 (1.295)      Gen. Loss 135.4901 (135.4901)   Disc. Loss 0.2673 (0.2673)     
```F
AnjanDutta commented 5 years ago

Why do you have size torch.Size([27, 512])? It should be torch.Size([220, 512]). You have problem in dataset. You have very few examples that can fit within only two batches, that's very strange.

yoooo233 commented 5 years ago

I'm also puzzled why this happens. Can you tell me which function are setting or calculating the result "220"/"27"? I didn't find such default value in the model.

I find the value of "27" may be calculated by the parameter 'num_clss' which is calculated as follows in test.py line 154

 # Number of classes
    params_model['num_clss'] = len(dict_clss)

Then I checked how the dict_clss is defined and add a print after the dict_clss is done to show what it is,in test.py line 109,and the print is added after the original line 128

    if args.gzs_sbir > 0:
        perc = 0.2
        _, idx_sk = np.unique(splits['tr_fls_sk'], return_index=True)
        tr_fls_sk_ = splits['tr_fls_sk'][idx_sk]
        tr_clss_sk_ = splits['tr_clss_sk'][idx_sk]
        _, idx_im = np.unique(splits['tr_fls_im'], return_index=True)
        tr_fls_im_ = splits['tr_fls_im'][idx_im]
        tr_clss_im_ = splits['tr_clss_im'][idx_im]
        if args.dataset == 'Sketchy' and args.filter_sketch:
            _, idx_sk = np.unique([f.split('-')[0] for f in tr_fls_sk_], return_index=True)
            tr_fls_sk_ = tr_fls_sk_[idx_sk]
            tr_clss_sk_ = tr_clss_sk_[idx_sk]
        idx_sk = np.sort(np.random.choice(tr_fls_sk_.shape[0], int(perc * splits['te_fls_sk'].shape[0]), replace=False))
        idx_im = np.sort(np.random.choice(tr_fls_im_.shape[0], int(perc * splits['te_fls_im'].shape[0]), replace=False))
        splits['te_fls_sk'] = np.concatenate((tr_fls_sk_[idx_sk], splits['te_fls_sk']), axis=0)
        splits['te_clss_sk'] = np.concatenate((tr_clss_sk_[idx_sk], splits['te_clss_sk']), axis=0)
        splits['te_fls_im'] = np.concatenate((tr_fls_im_[idx_im], splits['te_fls_im']), axis=0)
        splits['te_clss_im'] = np.concatenate((tr_clss_im_[idx_im], splits['te_clss_im']), axis=0)

    # class dictionary
    dict_clss = utils.create_dict_texts(splits['tr_clss_im'])
    #added print to find out what is dict_clss
    print ("dict_clss is ",(dict_clss))

The output of the print is:

Loading data...dict_clss is  {'bench': 0, 'cigarette': 1, 'diamond': 2, 'door_handle': 3, 'ear': 4, 'eye': 5, 'face': 6, 'feather': 7, 'fire_hydrant': 8, 'flower_with_stem': 9, 'flying_saucer': 10, 'hand': 11, 'human_skeleton': 12, 'moon': 13, 'mouth': 14, 'nose': 15, 'person_sitting': 16, 'person_walking': 17, 'power_outlet': 18, 'present': 19, 'santa_claus': 20, 'skull': 21, 'snowman': 22, 'sponge_bob': 23, 'sun': 24, 'teddy_bear': 25, 'tooth': 26}

There happens to be 27 class names here so I just guess the difference may be caused by this. I didn't find any file like this and these class names are part of the TU-Berlin dataset.And I'm checking codes in utils.py like the function load_files_tuberlin_zeroshot which includes 'tr_clss_im' to find out why I got a strange 27. What's more ,when I sucessfully run the test.py using my own trained model ,the out put of the print is the same including 27 classes.

AnjanDutta commented 5 years ago

Can you please check how many classes are there in your TU-Berlin dataset folder?

yoooo233 commented 5 years ago

251

AnjanDutta commented 5 years ago

There should be exactly 250 classes in the TU-Berlin folder. I suggest you to run the code by not setting the gzs-sbir flag. There should be exactly 220 classes in the dict_clss dictionary.

yoooo233 commented 5 years ago

I'sorry ,I take the list.txt into account,actually there is 250 classes.

yoooo233 commented 5 years ago

Thanks for your tips guiding me to check the dataset again.As for the function load_files_tuberlin_zeroshot in utils.py, when it is loading dataset from the images,it only load files with .jpg . But the image part used in that project actually comes from the Extended_TU-Burlin and there are some files in some(most) classes are with an extension. JPEG so they won't be loaded(altough they are the same thing). I check the number of classses loaded both of sketch and image and my strange 27 can be interpreted as 31*0.88 .

def load_files_tuberlin_zeroshot(root_path, photo_dir='images', sketch_dir='sketches', photo_sd='', sketch_sd=''):

    path_im = os.path.join(root_path, photo_dir, photo_sd)
    path_sk = os.path.join(root_path, sketch_dir, sketch_sd)
    print("path_sk is",path_sk)

    # image files and classes
    fls_im = glob.glob(os.path.join(path_im, '*', '*.jpg'))
    fls_im = np.array([os.path.join(f.split('/')[-2], f.split('/')[-1]) for f in fls_im])
    clss_im = np.array([f.split('/')[-2] for f in fls_im])
    #sketch files and classes
    fls_sk = glob.glob(os.path.join(path_sk, '*', '*.png'))
    fls_sk = np.array([os.path.join(f.split('/')[-2], f.split('/')[-1]) for f in fls_sk])
    clss_sk = np.array([f.split('/')[-2] for f in fls_sk])

    # all the unique classes
    classes = np.unique(clss_im)
    classes_sk = np.unique(clss_sk)
    print("num_clss_im--",len(classes))
    print("num_clss_sk--", len(classes_sk))
    #print("num_clss_im--",len(classes),"classes_im are as:",classes)
    #print("num_clss_sk--",len(classes_sk),"classes_sk are as:", classes_sk)
    # divide the classes, done according to the "Zero-Shot Sketch-Image Hashing" paper
    np.random.seed(0)
    tr_classes = np.random.choice(classes, int(0.88 * len(classes)), replace=False)
    va_classes = np.random.choice(np.setdiff1d(classes, tr_classes), int(0.06 * len(classes)), replace=False)
    te_classes = np.setdiff1d(classes, np.union1d(tr_classes, va_classes))
Loading data...path_sk is /home/USR/Desktop/codes/Semantical_ZSIR/sem-pcyc-master/dataset/TU-Berlin/sketches/
num_clss_im-- 31
num_clss_sk-- 250
dict_clss is  {'bench': 0, 'cigarette': 1, 'diamond': 2, 'door_handle': 3, 'ear': 4, 'eye': 5, 'face': 6, 'feather': 7, 'fire_hydrant': 8, 'flower_with_stem': 9, 'flying_saucer': 10, 'hand': 11, 'human_skeleton': 12, 'moon': 13, 'mouth': 14, 'nose': 15, 'person_sitting': 16, 'person_walking': 17, 'power_outlet': 18, 'present': 19, 'santa_claus': 20, 'skull': 21, 'snowman': 22, 'sponge_bob': 23, 'sun': 24, 'teddy_bear': 25, 'tooth': 26}
Done
yoooo233 commented 5 years ago

Okay after transforming all image from .JEPG to .jpg , the pretrained model can be used and the mAP between two training epochs can bee calculated correctly now.

AnjanDutta commented 5 years ago

If you had used my download_datasets.sh script, the renaming from .JPEG to .jpg is done in line 167 of that file. For using the semantic information, we need to do some renaming of the TU-Berlin dataset classes. So make sure you run this script, otherwise you can get some bugs in the future. I am closing this issue.