OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.22k stars 283 forks source link

Apply ctranslate2 to KNN-MT #1100

Open BrightXiaoHan opened 1 year ago

BrightXiaoHan commented 1 year ago

Hi! I want to apply ctranslate2 to KNN-MT (There are some pytorch implementations, knn-box, and sockeye for example). Is there a corresponding interface to get the output hidden state of the model in order to do vector retrieval? In addition, since KNN-MT needs to do vector retrieval for each decoding step, it needs to be decoded word by word, while currently ctranslate2 only provides an interface to decode the whole sentence at once. Is it possible to provide an interface to reuse the encoder output at each decoding step to reduce redundant calculations?

guillaumekln commented 1 year ago

There are more and more requests to access intermediate outputs. However, everyone want to access a different outputs: the attention weights, or the decoder outputs, or the output logits, etc. We can't effectively support all these use cases from Python but what you are describing should already be possible from C++.