Closed dylanrandle closed 4 years ago
Even just a tiny CNN can get R^2 score of 0.927 on test set and start to overfit (https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS/commit/b4f45867379dda9f1247ff0f790ce7738dabbaf0). A bigger model like ResNet seems an overkill for this small graphene dataset.
It is trained & tested on the 3x5 coarse grid though. Not sure why the 30x80 fine grid is necessary, given that the two grids have one-to-one mappings.
I have successfully trained ResNet-18 (in PyTorch) on the 30x80 fine-grid graphene data (https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS/commit/b826349e01f585aaead783e05fed33b8a988e582). The R^2 is 0.87, much higher than a simple 2-layer CNN training on the same fine-grid data (R^2 \~0.6). The entire pipeline can be reproduced by this Kaggle GPU kernel. (There is also a simple CNN version for reference).
I also committed a reference PyTorch ResNet implementation training on CIFAR-10 (~84% accuracy) (https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS/commit/ed117940a9f43c8be62047ca0b3d2d822e3f990c) ; it can be reproduced at this colab link.
It would be great to get some GCP/AWS credits, so we don't have to heavily rely on Colab/Kaggle. Although the free GPUs are great, the resources are still quite limited (only a single K80; the training can go much faster with several V100). Version-control is also a bit annoying.
The fine-grid graphene data is a tricky problem, because we can easily hand-design a tiny NN that works very well:
This works because the both the training and the test sets can be perfectly encoded by a much lower-dimensional coarse grid version.
I got R^2 ~ 0.92 by training ResNet-18 for more epochs. See this Kaggle notebook. This matches the accuracy in their original paper (also R^2 ~ 0.92)
Another interpretation of "hand-designed architecture" is to use a residual block for each cell, to replace DARTS's learned-cell as shown in their paper:
As I understand, a residual block inside the DARTS framework would look like:
c_{k-2}
should not be used at all. The macro-cells are just stacked sequentially.c_{k-1}
should go through a conv
(which means Conv2d + BN + ReLU) to node 0
, and then another conv
to node 1
node 1
should just be the outputnode 2
and node 3
are not used. Alternatively, they can repeat what node 0
and node 1
do, and then node 3
will be the output.The diagram looks like (plotted via https://github.com/capstone2019-neuralsearch/AC297r_2019_NAS/commit/f3e647d1e3097fbbc8c17a4b9dd9406548dcb7ee):
or
One problem with implementation: the DARTS code always concats four intermediate nodes as cell output, but here we just want one intermediate node without any concatenation.
Another question is whether this whole idea is worth implementing. The project documentation says "ResNets" instead of "Residual blocks":
Understand and compare state-of-the-art architectures → VGG, GoogLeNet, ResNets, DenseNets, Highway Networks, etc.
I also have something similar, although I am using the c_{k-2} output as well, as my "first" residual connection.
Important note: the visualization does not accurately reflect the concatenation. As can be seen in, e.g., here, the concatenation is only applied over the specified normal_concat or reduce_concat. For example, the code would look like:
# Hand-designed ResNet Architecture
RESNET = Genotype(
normal=[
('skip_connect', 0),
('sep_conv_3x3', 1),
('skip_connect', 1),
('sep_conv_3x3', 2),
('skip_connect', 2),
('sep_conv_3x3', 3),
('skip_connect', 3),
('sep_conv_3x3', 4)],
normal_concat=[5], # whatever the idx of last box is
reduce=[
('skip_connect', 0),
('sep_conv_3x3', 1),
('skip_connect', 1),
('sep_conv_3x3', 2),
('skip_connect', 2),
('sep_conv_3x3', 3),
('skip_connect', 3),
('sep_conv_3x3', 4)],
reduce_concat=[5]) # whatever the idx of last box is
In order to match the number of parameters in DARTS, I shrink the standard ResNet-18 into a "ResNet-10" and reduced the number of filters by a factor 8 (from 64, 128... to 8, 16...). The model now only has 77k parameters, compared to >10,000k parameters in the standard ResNet-18 (see the sizes of common models). The R^2 is still ~0.92. See this Kaggle notebook to reproduce the result. The reference implementation is torchvision.models.resnet
The difference between "ResNet-10" and ResNet-18 is basically:
def ResNet10(**kwargs):
return ResNet(BasicBlock, [1,1,1,1], **kwargs)
def ResNet18(**kwargs):
return ResNet(BasicBlock, [2,2,2,2], **kwargs)
From ResNet paper, the [2,2,2,2]
means the number of conv blocks:
Changing it to [1,1,1,1]
reduces the layers from 8*2+2 = 18
to 4*2+2 = 10
.
We want to use famous/well-known architectures (cells) on the datasets.
Make this as independent of the dataset as possible.