gidariss / FewShotWithoutForgetting

MIT License
519 stars 110 forks source link

implementation differs with the reported #7

Closed kailigo closed 6 years ago

kailigo commented 6 years ago

Hi, I am studying your code. I found there are some places in the code that are different from what were reported in the paper.

1) There are overlaps between base class and novel classes. In the implementation, some classes are used both as novel class and base class, which is not the case as described in the paper. 2) The weights obtained in the first training stage are not used in the second stage. The weights in the second stage are randomly sampled from a normal distribution, rather than using what were obtained in the first stage.

Could you please give some explanation for your purpose of doing so? Thanks.

Another place that I am unclear is that it seems the novel classes are always the last five classes (labeled with 59~63), which however should be the case in practice.

I will keep reading your code -- maybe I misunderstand something. But your explanations could enlighten me for fully understanding your algorithm. Thanks,

gidariss commented 6 years ago

Hi @kailigo,

About your 1st question. During the training procedure the training categories are used both as base categories and as "fake" novel categories. This is necessary in order to train the few-shot classification weight generator (for more details see the 2nd training stage paragraph from the section 3.3 of the paper: https://arxiv.org/pdf/1804.09458.pdf). During testing however, the base and novel categories do not overlap. Specifically, during testing the base categories are the training categories used during the training procedure and the novel categories are sampled from the test or val category.

About your 2nd question. The weights obtained by the first training stage are loaded by specifying in the configuration file the 'pretrained' key of a network. For example, in the following configuration file: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/config/miniImageNet_Conv128CosineClassifierGenWeightAttN1.py

the paths of the checkpoint files from where the parameters will be loaded, are specified on lines 33 (https://github.com/gidariss/FewShotWithoutForgetting/blob/master/config/miniImageNet_Conv128CosineClassifierGenWeightAttN1.py#L33 ) and 37 (https://github.com/gidariss/FewShotWithoutForgetting/blob/master/config/miniImageNet_Conv128CosineClassifierGenWeightAttN1.py#L37)

The parameters are loaded (when specified) in the following places in the code: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/Algorithm.py#L93 and https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/Algorithm.py#L98

About your question "it seems the novel classes are always the last five classes (labeled with 59~63) ...". That's true. In the code the base categories are assigned ids in the range [0, num_base_categories-1] and the novel categories are assigned ids in the range [num_base_categories, num_base_categories + num_novel_categories - 1]. Those ids are not the same as the label number that the categories might have and are different on each training iteration (depending on the sampled base and novel categories).

gidariss commented 6 years ago

About your 1st question again. As I said, during the training procedure the training categories (which are the same as base categories) are used both as base categories and as "fake" novel categories. This is happening in the following place in the code: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/dataloader.py#L302 (see from line 302 till line 310). In contrast, during test time the base and novel categories do not overlap. See for that the following place in the code: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/dataloader.py#L294 (see lines 294 till 300).

kailigo commented 6 years ago

@gidariss , Thanks for the instant reply. But I am still not clear about some points.

  1. Why is it necessary to use training categories for both novel and base classes during the training. Based on my understanding, the weight generator can still work without this. Why not do as what is done in testing during the training? What is point here.

  2. What I was referring about "weights" is the weights of last fully-connected layer, not the feature extractor. I know you load the feature extractor weights obtained in the first training stage, but I am not sure you did so for the classifier weights (fully-connected layer), because you initialize the weights in the init function of classifier.

        weight_base = torch.FloatTensor(nKall, nFeat).normal_(
            0.0, np.sqrt(2.0/nFeat))
        self.weight_base = nn.Parameter(weight_base, requires_grad=True)
  1. If the ids do not correspond to the real labels, how could you ensure the correspondence of the weight of a class (a row or a column of the weight matrix in the fully-connected layer) and the real labels? In other words, how do you organize the class weights (some vectors) in a matrix, such that they align well with the real labels?

Thanks.

kailigo commented 6 years ago

Continue above:

Point 1: I noticed that you concatenate the weights of the base classes and the novel classes. If there are overlaps between the base and the novel classes, the weight matrix shall not have covered all the classes in the training set. Some classes are missed because their weights are replaced by those of the overlapped class since they emerged multiple times. In this case, how can you use the weight matrix, which can not cover the whole training classes, to classify the test data, which are sampled from all classes within the training category.

gidariss commented 6 years ago

@kailigo,

  1. During the training procedure you do not have available novel categories (they are called novel because they are given to you only after training). Therefore, in order to train the few-shot classification weight generator, in each training iteration I sample from the base categories N categories (e.g., N=5 in MiniImageNet) that act as novel categories for the purposes of that training iteration. For more, I would suggest you to read more carefully the 2nd training stage paragraph from the section 3.3 of the paper: https://arxiv.org/pdf/1804.09458.pdf.

  2. I might randomly initialize the classification weights as you said but then they are overwritten with the classification weights that are already learned from the 1st training stage. In the example of my previous post notice line 37 of the configuration file: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/config/miniImageNet_Conv128CosineClassifierGenWeightAttN1.py#L37

  3. Keeping the correspondence between real labels and ids are the job of the kids variable: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/algorithms/FewShot.py#L161 that if you follow its computational flow you will find that at: https://github.com/gidariss/FewShotWithoutForgetting/blob/master/architectures/ClassifierWithFewShotGenerationModule.py#L185 is used in order to index the proper base classification weight vectors from the class weights. This kids variable is created and by the dataloader at here (here named Kall): https://github.com/gidariss/FewShotWithoutForgetting/blob/master/dataloader.py#L412