How to implement in vision transformer?

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

Other

3.18k stars 328 forks source link

How to implement in vision transformer? #81

Open memorywxy opened 3 years ago

memorywxy commented 3 years ago

Thx for the marvelous work! It seems lightseq only integrates its operators into NLP models. Does it support vision transformers? What do we need to do for adopting lightseq in Vision Transformer works such as ViT or Swin Transformer?

Taka152 commented 3 years ago

@memorywxy Thanks, it is a good question. We have also noticed the success of the transformer model in CV. I think LightSeq should be able to support ViT model because it seems to keep the same transformer block as that from the "Attention is All Your Need" paper.

Currently, we haven't tried it, but we are working on it to make LightSeq able to help CV users.

Taka152 commented 3 years ago

Currently, it has been confirmed in our in-house business that lightseq can accelerate the ViT model 2-3x speedup. I'll try to add a ViT example next month.

For guys who want to try by themselves. It can be done by converting the encoder layer in torch ViT model to lightseq.training.LSTransformerEncoderLayer using the following code with some modifications. https://github.com/bytedance/lightseq/blob/f0a9cc7f6ff44ef5db8d8d568805b5815fc85165/examples/training/huggingface/ls_hf_transformer_encoder_layer.py#L61-L67

Jack47 commented 3 years ago

any updates on this? Missing your vit example~ @Taka152 @godweiyang

Taka152 commented 3 years ago

Thanks for your reminds, maybe next month.

zero0kiriyu commented 2 years ago

A few months pass, any update?

Taka152 commented 2 years ago

Thanks for your reminds. As the huggingface has released an example for image classification, we may give an example based on this in the future.

xs1997zju commented 2 years ago

any updates?