eth-sri / lmql

A language for constraint-guided and efficient LLM programming.
https://lmql.ai
Apache License 2.0
3.68k stars 198 forks source link

Tokens removed by STOPS_BEFORE count towards the length of the output #58

Open JasperDekoninck opened 1 year ago

JasperDekoninck commented 1 year ago

In the following query, we would not expect the query to end on the first occurrence of "wall" (output sentence: "what did the fish say when it hit the wall?"), since the length of the sequence right before the word is 37. However, since the word "wall" is also counted in the overall length of the sequence, len(JOKE) > 40 evaluates to true, even though it shouldn't yet.

argmax
   """A list of good dad jokes. A indicates the punchline
   Q: How does a penguin build its house?
   A: Igloos it together.
   Q: Which knight invented King Arthur's Round Table?
   A: Sir Cumference.
   Q:[JOKE]"""
from
   "openai/text-davinci-003"
where
   STOPS_BEFORE(JOKE, "wall") and len(JOKE) > 40
j4acks0n commented 1 year ago

i got the saame,is there solution?

lbeurerkellner commented 1 year ago

One workaround for now is to use STOPS_AT instead and to strip the suffix in code.