NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.87k stars 893 forks source link

[Question] Support for HuggingFace SwinV2 #426

Open IvensaMDH opened 1 year ago

IvensaMDH commented 1 year ago

Hello,

Recently support for SwinV2 models released under SwinTransformer v2.0.0 has been added to FT. Would it be possible to add support for Swin models trained with the Transformers library such as swinv2-tiny-patch4-window8-256?

As far as I can tell, these models have slight differences compared to the originally released models such as layer naming:

For the original model: image

For the HuggingFace model (classification):

image

Would it be feasible to apply FT to the latter model and if so, how?

Thanks,

/M

byshiue commented 1 year ago

If the model architecture is fully same, you only need to pass the weights of HF in correct order.

IvensaMDH commented 1 year ago

Hi,

It seems the HF architecture is slightly different...

However when trying with the pretrained model from here I still get an error:

[ERROR][SwinTransformerWeights::__init__] missing weight layers.0.blocks.0.attn.relative_position_index or layers.0.blocks.0.attn.relative_coords_table or layers.0.blocks.0.attn.cpb_mlp.0.weight or layers.0.blocks.0.attn.cpb_mlp.0.bias or layers.0.blocks.0.attn.cpb_mlp.2.weight.

model is a SwinV2 model:

model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=2)

*train*

model.eval()
torch.save({"model":model.state_dict()}, "model.pt")

and I get the above mentioned error when running python builder.py --batch-size 8 --cfg ../../pytorch/swin/Swin-Transformer-Quantization/SwinTransformer/configs/swinv2/swinv2_tiny_patch4_window8_256.yaml --resume ../../../model.pt --th-path ../../../build/lib/libth_transformer.so --version 2 --fp16 --output model.engine

What am I doing wrong?

Thanks!

byshiue commented 1 year ago

Do you encounter this error by using the checkpoint https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_tiny_patch4_window8_256.pth? Like running

python builder.py \
    --batch-size 32 \
    --cfg ../../pytorch/swin/Swin-Transformer-Quantization/SwinTransformer/configs/swinv2/swinv2_tiny_patch4_window8_256.yaml \
    --resume ../../pytorch/swin/Swin-Transformer-Quantization/swinv2_tiny_patch4_window8_256.pth \
    --th-path ../../../build/lib/libth_transformer.so \
    --version 2 \
    --fp16 \
    --output swin_transformer_fp16_v2.engine
IvensaMDH commented 1 year ago

When using the checkpoint you suggest everything is working .

However the model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=2) also uses that exact checkpoint:

image

For some reason the above mentioned layers get lost during training or is not saved correctly?

Thanks in advance,

EDIT:

In the load_pretrained of SwinV2, the above layers get removed? https://github.com/microsoft/Swin-Transformer/blob/eda255cdfb1f9ac2cb79cd0b45cabc614df42c3d/utils.py#L41

byshiue commented 1 year ago

When you use create_model, it may change some names of weights, removing some useless weights or add some additional weights.

IvensaMDH commented 1 year ago

Would you suggest that I just re initialize and add these back to the state_dict before saving?