Open d12306 opened 4 years ago
Hi @d12306, thanks for your question!
You are correct that we do not use feature extraction layers in this repo. The reason is that we wanted to test the capabilities of neural ODE layers specifically and not other convolutional layers. Indeed, if you use convolutional layers before the neural ODE layer, the model has roughly the same performance as if the neural ODE layer wasn't present at all. So if you add feature extraction layers before, you are really testing the performance of the feature extraction layers and not of the neural ODE operations. Indeed, as has been discussed in this paper too, when using feature extraction layers, the neural ODE layer may learn the identity map and so is not useful in this case.
On top of this, adding convolutional layers before the neural ODE layer removes most of the advantages of neural ODEs (including invertibility, ability to query state at any timestep and cheap Jacobian computations in normalizing flows), so it is not always desirable (this was also discussed in this issue).
Hope this helps!
sure, thanks for your explanation! great work!
when i run the cifar10 experient, 3conv+ode_layer+fc_layer achieves the accuracy of 85.84%, but the structure 3conv+fc_layer's accuracy is only 79.07%, that doesn't seem to be in line with your statement " So if you add feature extraction layers before, you are really testing the performance of the feature extraction layers and not of the neural ODE operations". It was mentioned in your paper that " These initial convolutions can be understood as implicitly augmenting the space (since they increase the number of channels)".Therefore, in the experimental comparison of node and anode, the practical effect of dimensionality augmentation in anode is to add layers of convolution but node doesn't have the layers, is it fair? @EmilienDupont
Hi, @EmilienDupont , thanks for the code implementation!
I am concerned about the structure implemented in this repo, actually, the architecture in the orginal Neural ODE (chen et al. ) has a feature extractor that consists of purely convolutional layers followed by the ODE represetation transformation layer and the final classification layer, while in this repo, only ode layers are present,
so in your experiments, when you compare to the neural ode, the neural ode you used actually does not have the feature extraction layer? but you only remove the concatenated channels (zeros) compared to the augmented neural ode?
I personally do not think if of a fair comparison, since removing the feature extraction layer will affect the classification model? A better comparison will be evaluting the two models when a proper feature extractor is present.
Please correct me if I am wrong