Issues with prefill & generate

AI-Hypercomputer / jetstream-pytorch

PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"

Apache License 2.0

41 stars 15 forks source link

Open qihqi opened 3 months ago

qihqi commented 3 months ago

As reported by @tengomucho

Currently there are a few issues with prefill / generate implemention:

Prefill does not use self._sample to do sampling.
Prefill returns a token, so first time generate calls it should return the second generated token, but now it returns the first token again. This is historical but quite unintuitive.