chenxin061 / pdarts

Codes for our paper "Progressive Differentiable Architecture Search:Bridging the Depth Gap between Search and Evaluation"
Other
359 stars 83 forks source link

How many layers we need in the evaluation stage? #27

Closed NdaAzr closed 4 years ago

NdaAzr commented 4 years ago

@chenxin061 As I understood from DARTS and PDARTS, for the evaluation stage we can use the best cell or stack of the best cell. Do you think would be OK if I use 1 layer instead of 20 for `parser.add_argument('--layers', type=int, default=1, help='total number of layers') as using 1 or 20 layers gives the same accuracy for my custom dataset?

Many thanks

chenxin061 commented 4 years ago

@chenxin061 As I understood from DARTS and PDARTS, for the evaluation stage we can use the best cell or stack of the best cell. Do you think would be OK if I use 1 layer instead of 20 for `parser.add_argument('--layers', type=int, default=1, help='total number of layers') as using 1 or 20 layers gives the same accuracy for my custom dataset?

Many thanks

The 20-cell setting is for a fair comparison with other methods. For your custom dataset, I think the target should be a better performance. So I recommand you to use the setting with the best performance on your own dataset.

NdaAzr commented 4 years ago

@chenxin061 Many thanks for your prompt reply. Regarding the number of layers, I have a question relevant to the DARTS method.

For the evaluation stage, if I trained the dataset with 1 layer, and another experiment with 2 layers, I was expecting to get a double number of parameters for the DARTS with 2 layers in compare with DARTS with 1 layer as they are the same. However, I am getting 33K number of parameters for 1 layer and 139K for DARTS with 2 layers. Do you have any idea about this?

P.S. I used this function to calculate the number of params. with or without p.requires_grad condition gives the same number of params.

   def count_parameters(model):
             return sum(p.numel() for p in model.parameters() if p.requires_grad)

    print("number of parameters in the model is: ", count_parameters(model))`
chenxin061 commented 4 years ago

@chenxin061 Many thanks for your prompt reply. Regarding the number of layers, I have a question relevant to the DARTS method.

For the evaluation stage, if I trained the dataset with 1 layer, and another experiment with 2 layers, I was expecting to get a double number of parameters for the DARTS with 2 layers in compare with DARTS with 1 layer as they are the same. However, I am getting 33K number of parameters for 1 layer and 139K for DARTS with 2 layers. Do you have any idea about this?

P.S. I used this function to calculate the number of params. with or without p.requires_grad condition gives the same number of params.

   def count_parameters(model):
             return sum(p.numel() for p in model.parameters() if p.requires_grad)

    print("number of parameters in the model is: ", count_parameters(model))`

There are some stem convolutions that may affect the parameter count.
Besides, the first normal cell is different from other normal cells because the input channel is C while others are 4C in the case of DARTS, which results in more parameters.

NdaAzr commented 4 years ago

@chenxin061 Thank you for your reply. It makes sense, but why you said 4C? The input after the first cell would be the current input and output from two previous layer. Isn't it?

In PDARTS and DARTS, visualize.py module plotting only a cell (e.g. the best cell). In case if I use the stack of two cells, and as you said cell 1 and 2 are different, how can I visualize them?

chenxin061 commented 4 years ago

@chenxin061 Thank you for your reply. It makes sense, but why you said 4C? The input after the first cell would be the current input and output from two previous layer. Isn't it?

In PDARTS and DARTS, visualize.py module plotting only a cell (e.g. the best cell). In case if I use the stack of two cells, and as you said cell 1 and 2 are different, how can I visualize them?

The key difference is that the inputs of the first cell are from the stem convolutions and one of the inputs of the second cell is the output of the first cell, which is the concatenation 4 intermediate nodes of the first cell (4C).

In the code of DARTS and P-DARTS, whatever the number of input channel is, it will be reshaped to C (suppose the output channel of this cell is 4C) by a 1\times1 convolution. Please refer to the code model.py for more details.

NdaAzr commented 4 years ago

@chenxin061 I was trying to draw the whole network for the DARTS based on a different number of layer (cell) have been used in the evaluation stage.

image

A: DARTS with 1 cell, B: DARTS with 2 cells, C: DARTS with 3 cells, and D: DARTS with 4 cells.

The question is why if we set the`parser.add_argument('--layers', type=int, default=1, help='total number of layers') as 1, we will have first reduction cell same as A in fig, and no normal cell. Is there any specific reason? As far as I understood, when we set the --layers = 3, then we will have a normal cell.

Could you please point me in the right direction if I what I said is incorrect?

Many thanks

chenxin061 commented 4 years ago

@NdaAzr If you want to apply this code to a network with less than 5 cells, you should consider modifying the network definition in line 122 of model.py.

NdaAzr commented 4 years ago

chenxin061 thank you for the reply. For now, I don't want to change the network, I just wanted to make sure what I understood from DARTS is correct? for example, if we use 1 layer, I only have a reduction cell same as (A), and if I use 2 layers, I will have 2 reduction cells same as (B), and so on.

chenxin061 commented 4 years ago

@NdaAzr For the code, yes. But I suggest you not apply a network with less than 5 cells.

NdaAzr commented 4 years ago

@chenxin061 thank you for the suggestion. Sorry, I am asking many questions. Any specific reason for this? How about if it giving a good performance with less than 5 cells in private dataset?

chenxin061 commented 4 years ago

@NdaAzr This suggestion is based on our experiments on CIFAR and ImageNet. If you found a better setting for your private dataset, then you should use it in your experiments. Anyway, our objective is to find a better architecture with a proper setting.