D-X-Y / AutoDL-Projects

Automated deep learning algorithms implemented in PyTorch.
MIT License
1.56k stars 281 forks source link

entropy of arch-parameter #56

Closed chunhuizng closed 4 years ago

chunhuizng commented 4 years ago

Thanks for your awesome works! I am interested in entropy of arch-parameters in GDAS. I reproduce the search on the NASNet search space by this script:

CUDA_VISIBLE_DEVICES=0 OMP_NUM_THREADS=4 nohup python ./exps/algos/GDAS.py \
 --search_space_name darts \
 --config_path  configs/search-opts/GDAS-NASNet-CIFAR.config \
 --model_config configs/search-archs/GDAS-NASNet-CIFAR.config \
 --tau_max 10 --tau_min 0.1 --track_running_stats 1 \
 --arch_learning_rate 0.0003 --arch_weight_decay 0.001 \
 --workers 4 --print_freq 200 --rand_seed 3

After 250 epochs, log shows result about arch-parameters:

GDAS:
arch-normal-parameters :
tensor([[0.1298, 0.1118, 0.1363, 0.1426, 0.1277, 0.1364, 0.1098, 0.1056],
        [0.1367, 0.1312, 0.1239, 0.1205, 0.1316, 0.1265, 0.1155, 0.1141],
        [0.1194, 0.1204, 0.1315, 0.1375, 0.1278, 0.1243, 0.1180, 0.1210],
        [0.1201, 0.1329, 0.1248, 0.1284, 0.1220, 0.1194, 0.1222, 0.1303],
        [0.1642, 0.1209, 0.1264, 0.1283, 0.1170, 0.1162, 0.1196, 0.1075],
        [0.1074, 0.1199, 0.1356, 0.1319, 0.1254, 0.1245, 0.1222, 0.1332],
        [0.1124, 0.1192, 0.1318, 0.1275, 0.1204, 0.1243, 0.1318, 0.1325],
        [0.1347, 0.1266, 0.1277, 0.1316, 0.1182, 0.1189, 0.1241, 0.1182],
        [0.1616, 0.1139, 0.1292, 0.1359, 0.1217, 0.1148, 0.1153, 0.1076],
        [0.1081, 0.1160, 0.1341, 0.1488, 0.1162, 0.1271, 0.1209, 0.1288],
        [0.1147, 0.1228, 0.1243, 0.1292, 0.1209, 0.1241, 0.1260, 0.1378],
        [0.1314, 0.1234, 0.1236, 0.1280, 0.1197, 0.1146, 0.1279, 0.1313],
        [0.1449, 0.1207, 0.1264, 0.1269, 0.1195, 0.1221, 0.1213, 0.1184],
        [0.1504, 0.1169, 0.1286, 0.1298, 0.1244, 0.1275, 0.1184, 0.1039]])
arch-reduce-parameters :
tensor([[0.1213, 0.1122, 0.1344, 0.1396, 0.1237, 0.1307, 0.1252, 0.1130],
        [0.1426, 0.1135, 0.1283, 0.1378, 0.1215, 0.1263, 0.1201, 0.1099],
        [0.1162, 0.1188, 0.1332, 0.1349, 0.1236, 0.1292, 0.1282, 0.1160],
        [0.1299, 0.1170, 0.1237, 0.1397, 0.1184, 0.1280, 0.1249, 0.1184],
        [0.1444, 0.1202, 0.1307, 0.1356, 0.1231, 0.1266, 0.1140, 0.1055],
        [0.1147, 0.1179, 0.1265, 0.1499, 0.1178, 0.1203, 0.1287, 0.1241],
        [0.1285, 0.1207, 0.1219, 0.1291, 0.1201, 0.1260, 0.1273, 0.1262],
        [0.1360, 0.1195, 0.1324, 0.1285, 0.1220, 0.1306, 0.1197, 0.1112],
        [0.1405, 0.1225, 0.1253, 0.1323, 0.1273, 0.1332, 0.1151, 0.1037],
        [0.1176, 0.1202, 0.1264, 0.1417, 0.1217, 0.1225, 0.1246, 0.1253],
        [0.1225, 0.1205, 0.1220, 0.1375, 0.1196, 0.1242, 0.1273, 0.1264],
        [0.1292, 0.1276, 0.1255, 0.1274, 0.1200, 0.1286, 0.1245, 0.1172],
        [0.1341, 0.1236, 0.1259, 0.1339, 0.1259, 0.1275, 0.1176, 0.1114],
        [0.1363, 0.1143, 0.1330, 0.1337, 0.1297, 0.1312, 0.1170, 0.1047]])

And DARTS has a lower entropy on arch-params after search preiod:

####### DARTS  ALPHA #######
# Alpha - normal
tensor([[0.0899, 0.0729, 0.1902, 0.1657, 0.1178, 0.0960, 0.1508, 0.1167],
        [0.0499, 0.0417, 0.0843, 0.1648, 0.0861, 0.0959, 0.0914, 0.3860]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.0909, 0.0831, 0.1966, 0.1740, 0.0911, 0.1148, 0.0883, 0.1613],
        [0.0472, 0.0429, 0.0835, 0.1039, 0.0959, 0.1019, 0.0923, 0.4323],
        [0.0292, 0.0256, 0.0662, 0.0505, 0.0500, 0.0678, 0.0843, 0.6264]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.0846, 0.0582, 0.1223, 0.0983, 0.0966, 0.1338, 0.0989, 0.3074],
        [0.0520, 0.0382, 0.0741, 0.0734, 0.0926, 0.0695, 0.0577, 0.5425],
        [0.0249, 0.0218, 0.0524, 0.0552, 0.0496, 0.0504, 0.0624, 0.6834],
        [0.0155, 0.0153, 0.0248, 0.0321, 0.0300, 0.0485, 0.0351, 0.7985]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.0855, 0.0581, 0.1236, 0.1014, 0.0807, 0.0872, 0.0883, 0.3752],
        [0.0322, 0.0277, 0.0533, 0.0736, 0.0476, 0.0545, 0.0485, 0.6627],
        [0.0165, 0.0147, 0.0327, 0.0263, 0.0266, 0.0303, 0.0300, 0.8229],
        [0.0121, 0.0120, 0.0183, 0.0208, 0.0179, 0.0292, 0.0293, 0.8604],
        [0.0090, 0.0094, 0.0128, 0.0155, 0.0135, 0.0216, 0.0306, 0.8875]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)

# Alpha - reduce
tensor([[0.1978, 0.1735, 0.1204, 0.1005, 0.1259, 0.1252, 0.0847, 0.0719],
        [0.1206, 0.1170, 0.1283, 0.1022, 0.1397, 0.1037, 0.1071, 0.1813]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1839, 0.1749, 0.1024, 0.1225, 0.1516, 0.0880, 0.1007, 0.0759],
        [0.1134, 0.1146, 0.1024, 0.1367, 0.1185, 0.1722, 0.1148, 0.1275],
        [0.0814, 0.1012, 0.1834, 0.1091, 0.1084, 0.1327, 0.1303, 0.1535]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.2194, 0.2491, 0.1104, 0.0840, 0.0993, 0.0926, 0.0752, 0.0699],
        [0.1264, 0.1375, 0.1261, 0.1583, 0.1097, 0.1367, 0.0915, 0.1137],
        [0.0777, 0.1084, 0.2286, 0.1024, 0.0895, 0.1121, 0.1253, 0.1559],
        [0.0693, 0.0847, 0.1822, 0.1301, 0.1072, 0.1233, 0.1398, 0.1635]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)
tensor([[0.1893, 0.2391, 0.0836, 0.1049, 0.1074, 0.1131, 0.0914, 0.0712],
        [0.1107, 0.1356, 0.2006, 0.1145, 0.0955, 0.1010, 0.1016, 0.1405],
        [0.0685, 0.0943, 0.1735, 0.1858, 0.1071, 0.1072, 0.1321, 0.1315],
        [0.0621, 0.0764, 0.1957, 0.0852, 0.1133, 0.1017, 0.1207, 0.2449],
        [0.0519, 0.0710, 0.1500, 0.0831, 0.0794, 0.0926, 0.1218, 0.3501]],
       device='cuda:0', grad_fn=<SoftmaxBackward>)

Has the entropy of arch-params in GDAS reproduced by myself dropped enough compared to your experiment? I am curious about your results with search model's alphas.

D-X-Y commented 4 years ago

log-seed-604.txt

This is one log file that I can find now. Actually, I did not monitor the entropy before. Good point. I'm now extending my GDAS on ImageNet, and will let you know how the entropy changed.

chunhuizng commented 4 years ago

log-seed-604.txt

This is one log file that I can find now. Actually, I did not monitor the entropy before. Good point. I'm now extending my GDAS on ImageNet, and will let you know how the entropy changed.

Thank you for your very detailed reply! I am interested in GDAS entropy's change on ImageNet. Do you have any new results? Looking forward to your reply.

D-X-Y commented 4 years ago

I tried on ProxylessNAS search space, which the entropy decreases from 1.8 to 1.5,