Currently there are a few issues with prefill / generate implemention:
Prefill does not use self._sample to do sampling.
Prefill returns a token, so first time generate calls it should return the second generated token, but now it returns the first token again. This is historical but quite unintuitive.
As reported by @tengomucho
Currently there are a few issues with prefill / generate implemention:
self._sample
to do sampling.