DaiShiResearch / TransNeXt

[CVPR 2024] Code release for TransNeXt model
Apache License 2.0
387 stars 15 forks source link

Parameter setting problem #4

Closed hero-White closed 5 months ago

hero-White commented 7 months ago

In Part D.6 of the article, what is the sr_ratios list set when the input size is 256*256, the pool size is 4, and the aggregated attention is used in all four stages?

DaiShiResearch commented 7 months ago

Taking transnext_micro as an example, you can use the following configuration to implement a model with a pool_size of 4, an input resolution of $256^2$, and a model that uses aggregated attention in all four stages:

@register_model
def transnext_micro(pretrained=False, **kwargs):
    model = TransNeXt(window_size=[3, 3, 3, 3],
                      patch_size=4, embed_dims=[48, 96, 192, 384], num_heads=[2, 4, 8, 16],
                      mlp_ratios=[8, 8, 4, 4], qkv_bias=True,
                      norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 15, 2], sr_ratios=[16, 8, 4, 2],
                      **kwargs)
    model.default_cfg = _cfg()

    return model

Additionally, you’ll need to adjust the calculation of relative_pos_index and relative_coords_table in the model as follows:

relative_pos_index, relative_coords_table = get_relative_position_cpb(query_size=to_2tuple(img_size // (2 ** (i + 2))),
                                                                      key_size=to_2tuple(img_size // ((2 ** (i + 2)) * sr_ratios[i])),
                                                                      pretrain_size=to_2tuple(pretrain_size // (2 ** (i + 2))))

This change is necessary because the previously released version defaults to a pool_size of 1/32 of the input image size, whereas now it’s set to 1/64.