Our NEKO vocabulary includes non-text tokens such as discrete action tokens, continuous action tokens etc., along with text tokens. When generating text tokens for tasks such as image-caption and pure text (perhaps more), we need to investigate whether we should restrict the generated tokens fall in the range of the text tokens of the vocabulary.
I am leaning toward needing this restriction and will put that restriction in the image-caption task and test it out
Our NEKO vocabulary includes non-text tokens such as discrete action tokens, continuous action tokens etc., along with text tokens. When generating text tokens for tasks such as image-caption and pure text (perhaps more), we need to investigate whether we should restrict the generated tokens fall in the range of the text tokens of the vocabulary.
I am leaning toward needing this restriction and will put that restriction in the image-caption task and test it out