Add encdec_attention cache to transformer.py to speed up inference.

THUNLP-MT / THUMT

An open-source neural machine translation toolkit developed by Tsinghua Natural Language Processing Group

BSD 3-Clause "New" or "Revised" License

701 stars 197 forks source link

Open liushaokong opened 2 years ago

liushaokong commented 2 years ago

The encdec_attention is added to model/transformer.py, it is helpful to speed up inference.
When we convert the pytorch model (model.pt) to onnx models (like fastt5), it is necesaary to show how the encdec attention is used.