Hi Jiahui,
I found some strange issues during training US-ResNet50. When I follow sandwich rule to randomly choose 2 mid-widths, the training is slow. But when I use two fixed random widths(this is kind like training S-Net, let's say the width_list is fixed [0.25, 0.65424, 0.76534, 1.0]), the training will be much faster(more than 2 times). Is this the same with you? Since I think the training speed should be comparable.
That's very strange. On my side the speed is similar. It seems the only time-consuming part can be your random sampling (floating values) method? I use python default random library.
Hi Jiahui, I found some strange issues during training US-ResNet50. When I follow sandwich rule to randomly choose 2 mid-widths, the training is slow. But when I use two fixed random widths(this is kind like training S-Net, let's say the width_list is fixed [0.25, 0.65424, 0.76534, 1.0]), the training will be much faster(more than 2 times). Is this the same with you? Since I think the training speed should be comparable.