Open rlouf opened 7 months ago
It may also be interesting to get the join token likelihood, if available. I'm not super familiar with outlines but I'd love to be able to compare Sequence
s probabilistic.
We could store that in addition to the sequence weights (which can be, but are not necessarily, the log-probability of the sequence).
Hi @rlouf, I was directed towards this issue by @lapp0 as a prerequisite issue for #657. I'm interested in contributing, but would like to get a sense of the scope of work involved so that I don't make promises I can't keep.
I'm also interested, currently working on it right now.
Great! It is fairly involved and there are many important design decisions that need to be made, and we need to handle computation of the KV cache after concatenating text to a previous generation.
don't hesitate to open a draft PR asap so I can give some feedback early on.
would like to get a sense of the scope of work involved so that I don't make promises I can't keep.
It is fairly involved, interleaving function calls should be easier to implement though.
LmScript, a graphical interface for Outlines programs, makes heavy usage of continuous generation.
We currently re-send the accumulated prompt for every generation call and handle the chat template on our end.
Better performance for continuous generation would be highly appreciated
Super excited for this feature!
One note: It'd be great if continuous generation is implemented so that intermediate outputs can be processed and reused during generation:
sequence = "What are the most popular names of vehicles and the length of their names?\n"
for i in range(6):
sequence += f"{i}, "
vehicle_name_gen = generator(sequence, stop_at=["\n"])
name_len = process(len, vehicle_name_gen) # `process` would be part of the outlines API and execute the given function during generation
sequence += vehicle_name_gen + ", " + name_len + " characters long."
sequence += "\n"
I am opening this issue to roughly sketch the next big milestone for Outlines, tentatively called "continuous generation". There are many rough edges still, and open questions.
The first goal is to allow sampling of sequences like these:
By "sampling these sequences" I mean being able to run, for instance, beam search and optimize the sequence as a whole rather than each generation separately.
All we have to do is to return a
Sequence
object instead of a string, with the following attributes and methods:Sequence
should have the same feel as a string. Besides being able to print it, we should be able to slice it, add it to another string, another sequence, etc. and carry on:This should be enough to bring Outlines at feature-parity with other DSLs, while not being a DSL.