Open memorywxy opened 3 years ago
@memorywxy Thanks, it is a good question. We have also noticed the success of the transformer model in CV. I think LightSeq should be able to support ViT model because it seems to keep the same transformer block as that from the "Attention is All Your Need" paper.
Currently, we haven't tried it, but we are working on it to make LightSeq able to help CV users.
Currently, it has been confirmed in our in-house business that lightseq can accelerate the ViT model 2-3x speedup. I'll try to add a ViT example next month.
For guys who want to try by themselves. It can be done by converting the encoder layer in torch ViT model to lightseq.training.LSTransformerEncoderLayer
using the following code with some modifications.
https://github.com/bytedance/lightseq/blob/f0a9cc7f6ff44ef5db8d8d568805b5815fc85165/examples/training/huggingface/ls_hf_transformer_encoder_layer.py#L61-L67
any updates on this? Missing your vit example~ @Taka152 @godweiyang
Thanks for your reminds, maybe next month.
A few months pass, any update?
Thanks for your reminds. As the huggingface has released an example for image classification, we may give an example based on this in the future.
any updates?
Thx for the marvelous work! It seems lightseq only integrates its operators into NLP models. Does it support vision transformers? What do we need to do for adopting lightseq in Vision Transformer works such as ViT or Swin Transformer?