Open BrightXiaoHan opened 1 year ago
There are more and more requests to access intermediate outputs. However, everyone want to access a different outputs: the attention weights, or the decoder outputs, or the output logits, etc. We can't effectively support all these use cases from Python but what you are describing should already be possible from C++.
Hi! I want to apply ctranslate2 to KNN-MT (There are some pytorch implementations, knn-box, and sockeye for example). Is there a corresponding interface to get the output hidden state of the model in order to do vector retrieval? In addition, since KNN-MT needs to do vector retrieval for each decoding step, it needs to be decoded word by word, while currently ctranslate2 only provides an interface to decode the whole sentence at once. Is it possible to provide an interface to reuse the encoder output at each decoding step to reduce redundant calculations?