Closed ranery closed 3 years ago
Hi Haoran @ranery
Evaluation, BatchNorm recalibration and architectures ranking are automatically performed after every supernet training stage (default for 6 epochs). The best architecture of current stage is the top-1 encoding saved in the path rank file.
For example, our saved encodings are provided here: https://github.com/changlin31/BossNAS/tree/main/ranking_mbconv/path_rank The best architecture's encoding is the combination of the first ones in each file, i.e. 11 0110 0111 1100 0110. (0&1 represent the first & the second candidate operation)
Got it, thanks for your response!
I encountered another problem when resuming the checkpoints for continually training.
RuntimeError: Given groups=1, weight of size 1024 512 3 3, expected input[64, 1024, 28, 28] to have 512 channels, but got 1024 channels instead
It occurs when it executes at:
if fmap_size > self.fmap_size:
residual = self.downsample_d(x)
x = self.conv1_d(x)
x = self.peg_d(x)
x = self.bn1_d(x)
Do you have any thoughts regarding this issue?
I need more informations to find out the problem. But I guess this issue happens because you resumed from a different stage.
Which code are you running? Are you searching on MBConv search space or HyTra search space? Are you resuming from the first stage or not?
You can try modifying the start_block here to match with your checkpoint. https://github.com/changlin31/BossNAS/blob/e8d92e86fafe55466eafe13e33b8959f8347095f/searching/configs/hytra_bs64_accumulate8_ep6_multi_aug_gpus8.py#L12
Thanks for your timely response, I figured this issue out! But I find that my searched block is quite regular: [1, 1, 1, 1] → [1, 1, 1, 1] → [0, 0, 0, 0] → [0, 0, 0, 0]
I am wondering whether do you have an explanation for this structure since it is quite different from your retrained ones.
Could you elaborate more about the rationale behind your searching methods? Specifically, how do you identify whether to conduct downsampling at current stages?
This block partition is only used for supernet training. Different partition does not change the architecture. Your searched architecture is actually [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0].
This case is actually very likely to happen, as conv
is better than attn
in early stages and attn
is better than conv
in later stages.
Your searched architecture is actually downsampled to the smallest scale from the beginning. We have 6 candidate operations in HyTra Search Space:
0: ResAttn @ 7x7 1: ResConv @ 7x7 2: ResAttn @ 14x14 3: ResConv @ 14x14 4: ResConv @ 28x28 5: ResConv @ 56x56
The number after @ are resolution.
We have now add the restriction to avoid the case of downsampling across multiple scale. Hope this can solve the problem.
Thanks for your clarification!
Hi, thanks for your great work!
I tried using your given searching code for training the supernet. But I did not figure out how to search the potential architectures from such a supernet?
I guess the validation hook serves as such functions, but I did not find the saved path information after training one epoch. Are there other files I need to explore or just waiting for more epochs to be trained?
Could you advise me about that, thanks in advance for your time and help!
Best, Haoran