Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Other
1.49k stars 198 forks source link

Is plain transformer decoder already supported? #179

Closed auspicious3000 closed 4 years ago

auspicious3000 commented 4 years ago

The document indicates the transformer decoder from openmnt is supported. However, the transformer decoder benchmark https://github.com/Tencent/TurboTransformers/blob/master/docs/decoder.md says "We are still working on decoder model optimization." I was wondering if plain transformer decoder acceleration is supported at this time? If so, how much is the performance gain? Thanks!

文档里面说普通的transformer decoder是支持的,但是在benchmark里面又说还有待开发。请问现在支持普通decoder的加速吗?如果支持的话,大概能加速多少呢?我用的不是bert或者gpt之类的标准模型,而是一个和attention is all you need里面类似的普通decoder. 非常感谢!

feifeibear commented 4 years ago

Decoder is supported. For the sake of illustration, it is used in a translation application. See this repo for details https://github.com/TurboNLP/Translate-Demo. In the case of Decoder inference, the speedups of Turbo to PyTorch are ranging from 1.85x-2.51x. See detail graph in Fig. 10 of https://arxiv.org/pdf/2010.05680.pdf