google-research / multilingual-t5

Apache License 2.0
1.25k stars 129 forks source link

How to access the outputs before sentencepiece detokenisation? #88

Open pasricha opened 3 years ago

pasricha commented 3 years ago

Hi there,

Is there a way to look at the output tokens before they are detokenised? Right now I am using model.predict() (as shown in the t5-trivia example) to generate the output for a sequence to sequence model, but this saves the detokenized output to a file.

I have tried looking at the result in the decode method, but it also returns output which is already detokenized. I want to see the output token ids before sentencepiece detokenization. How can I do this?

Thanks