ViTAE-Transformer / ViTDet

Unofficial implementation for [ECCV'22] "Exploring Plain Vision Transformer Backbones for Object Detection"
Apache License 2.0
526 stars 46 forks source link

Does the VitDet still adopt FPN rather than the simple pyramid structure in VITDet paper? #14

Open larenzhang opened 2 years ago

larenzhang commented 2 years ago

In the config file https://github.com/ViTAE-Transformer/ViTDet/blob/main/configs/ViTDet/ViTDet-ViT-Base-100e.py, It seems that the VitDet in your implementation still adopts the FPN modules rather than the simple pyramid structure as proposed in VitDet paper.

Annbless commented 2 years ago

Hi, we modify the FPN file with an extra option use_residual=False to disable the residual connection in the FPN module, which only serves for feature dimension change, as described in the simple pyramid structure.