Release of trained checkpoint

NKSagarReddy commented 1 year ago

Hi, Thank you for your work! I am trying to reproduce some of the results mentioned in the paper, could you please share the NASVIT (A0-A5) checkpoints as well?

maryanpetruk commented 1 year ago

I would like to upvote this, @dilinwang820 could you please suggest how to evaluate the results from the paper mentioned in Table 13?

maryanpetruk commented 1 year ago

In the paper at the 18th page one may see the Architecture configuration of the A1-A4 models, but there are only MBConv-1 to MBConv-3 configurations, what are the MBConv-4-5-6-7 as it is written in the misc/config.py

maryanpetruk commented 1 year ago

Maybe @ChengyueGongR @dilinwang820 @endernewton @stephenyan1231 @yuandong-tian could help us with this issue?

We would like to verify the results from the paper. Thank you

dilinwang820 commented 1 year ago

Hello, my appologies, I totally missed this thread. Each checkpoint in the table is a subnet of the supernet, with that said, there're no additional weights needed for the checkpoints above. Would you be able to slice a checkpoint from the supernet shared in the repo?

maryanpetruk commented 1 year ago

@dilinwang820 thanks for the responce. I am unable to slice a NASViT-A1, A2, A3, A4 mentioned in the paper (last page) from the supernet. In the paper there is information only about the MBConv-1, 2, 3 (supernet_config.mb1, 2, 3).

But in the code there are configuration named supernet_config.mb4, 5, 6, 7. What is the expansion ratios, depth values, for those 4-7 MBConvs in order to instantiate the NASViT-A1, A2, A3, A4 models?

Could you please provide the full configuration for them? In the paper you have mentioned Transformer-4-7 blocks configuration, but it is impossible to set them in the code, there are only _C.supernet_config.mb# blocks configuration that one may set to sample a network.

ChengyueGongR commented 1 year ago

Hi, for the transformer blocks, instead of sample an active subnet, we dynamically masking the upsampled channels in the inference network. In order to get the exact active subnet, you should define the sub transformer net and load the weights on your own. For the mobile network blocks, we name all these blocks 'mb' instead of 'transformer' which may make you confused. If you open our checkpoints, from block.20, there are transformer layers, the sampling rule is the same as mb blocks and the hyper-parameters (d, w) are the same and we ignore ks for transformer blocks.

maryanpetruk commented 1 year ago

Could you please verify what is the required expand_ratio and depth are needed to instantiate e.g. NASViT-A1?

Specifically parameters choice for A1

supernet_config.mb4.d
supernet_config.mb5.d
supernet_config.mb6.d
supernet_config.mb7.d

supernet_config.mb4.t
supernet_config.mb5.t
supernet_config.mb6.t
supernet_config.mb7.t

Because in the paper you mentioned that expand_ratio for the transformers blocks 4-7 are 1, but in the supernet design space one can only select [4, 5, 6] for mb4, mb5 and only [6] for mb6, mb7.

Screenshot 2023-09-29 at 14 59 54

And if instantiate the model with such configuration the acc1 is 2.154 % and flops count is 188.37384. In the paper you have mentioned that A1 should have acc1 of 79.7 % and flops = 309

{'net_id': 'a1', 'mode': 'evaluate', 'epoch': -1, 'acc1': 2.154, 'acc5': 6.63, 'loss': 9.328627667541504, 'flops': 188.37394000000003, 'params': 7.722412, 'resolution': 192, 'width': [16, 16, 24, 32, 64, 112, 160, 216, 1792], 'kernel_size': [3, 3, 3, 3, 3, 3, 3], 'expand_ratio': [1, 4, 4, 1, 1, 1, 1], 'depth': [1, 3, 3, 4, 3, 3, 3]}

Thank you

ChengyueGongR commented 1 year ago

Hi, due to the transformer block design, for example, transformer-7 contains few transformers and one additional MB-block to change width. Therefore, the expand ratio 1 is for these transformers, while in the supernet design space one can only select [4, 5, 6] for mb4, mb5 and only [6] for mb6, mb7 is for the one additional MB-block, for this block, the expand ratio is constrained while the transformer block expand ratio is by default set to 1, therefore, you do not need to control the transformer layers expand ratio with the hyper-parameter configs.

For the accuracy issue, could you first try the attentive_nas_eval.validate' function inmain.py' to see whether this is an issue for the checkpoint or the sampling function. Thanks.

facebookresearch / NASViT

Release of trained checkpoint #9