Add some builtin support for generating multiple generations. This leaves room for more efficient generation on backends like HF (you only need to encode once). Also can potentially cache multiple generations (ideally in clever way that lets you grow the number of generations and reuse prior caches).
Add some builtin support for generating multiple generations. This leaves room for more efficient generation on backends like HF (you only need to encode once). Also can potentially cache multiple generations (ideally in clever way that lets you grow the number of generations and reuse prior caches).