Raschka-research-group / coral-cnn

Rank Consistent Ordinal Regression for Neural Networks with Application to Age Estimation
https://www.sciencedirect.com/science/article/pii/S016786552030413X
MIT License
335 stars 62 forks source link

About coral-cnn model #16

Open konioyxgq opened 4 years ago

konioyxgq commented 4 years ago

You used nn.AvgPool2d(7, stride=1, padding=2) at the end of the cnn network, the network input is 120 x 120, the input of pooling layer is 4 x 4, then after the pooling layer to get the output size is 2 x 2, these 4 values are the same. What's the point of this design? Or I've miscalculated it.

rasbt commented 4 years ago

It wouldn't really matter in this case whether you choose

nn.AvgPool2d(4)

or

nn.AvgPool2d(7, stride=1, padding=2)

or in general any number >= 4.

In all cases all the values are averaged if the input to that layer is 4x4. It's been a while, but if I remember correctly, the reason why we have this particular avg pooling layer was that we were initially experimenting with larger input images also. But in the end we used 120x120 to make the comparison with the Niu et al 2016 paper more fair because that's what they used.

konioyxgq commented 4 years ago

You mean, all three choices get the same result? If it's the same, it's just a difference in parameter amounts. Right?

rasbt commented 4 years ago

Yeah. AvgPooling doesn't have any parameters. It's just averaging the pixels.

konioyxgq commented 4 years ago

Yes, but it affects the parameters of the full connection layer

konioyxgq commented 4 years ago

One of the full connection layers is 512 x 1 and the other is 4 x 512

rasbt commented 4 years ago

Oh I see. That's because of the padding=2 then and you may be right that there are duplicated values. We probably did this because we were initially working with larger images.

konioyxgq commented 4 years ago

Well, so if the input here is 120 x 120, it should be changed to nn. AvgPool2d (4). There is no duplicate value in this, and the fully connected parameter becomes 512, which also reduces the amount of parameters and the amount of calculation。

rasbt commented 4 years ago

i agree

konioyxgq commented 4 years ago

I think it's going to affect the training process and the results will be different. What do you think?

rasbt commented 4 years ago

yeah, i think it could make a small difference. I don't think it will substantially change anything because these are just some redundancies; and it probably will also affect all models equally.

rasbt commented 4 years ago

I got curious and am currently rerunning the experiments. Looks like speed is not affected (makes sense because of broadcasting in PyTorch/CUDA probably) but performance does seem to improve a bit. I will update things once I have all results. It may take a while though.