dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.21k stars 471 forks source link

Phi3 models generate text without spaces when using `outlines.models.mlxlm` #982

Closed ahgraber closed 4 months ago

ahgraber commented 4 months ago

Describe the issue as clearly as possible:

mlx-community/Phi-3-mini-4k-instruct-4bit and mlx-community/Phi-3-mini-4k-instruct-8bit generate text without spaces when using outlines. Interestingly, mlx-community/Meta-Llama-3-8B-Instruct-8bit generates text with spaces. When loading with mlx_lm, all models generate responses as expected.

The implication of this is that the structured generation based on a Pydantic model fails because the generated text (which has no spaces) fails Pydantic validation.

Is this an issue with how outlines handles phi3 responses, or is an issue with the phi3 models from mlx-community?

Some basic testing I did: Using phi3 MLX models directly with mlx_lm works fine:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-8bit")
response = generate(model, tokenizer, prompt="What is a good apple?", verbose=False)
print(response)

'\n<|assistant|> A good apple can vary depending on personal preferences, but generally, a good apple is one that is:\n\n1. Firm: A good apple should have a firm texture, which indicates that it is fresh and not overripe.\n2. Colorful: A good apple should have a vibrant and consistent color, which can indicate ripeness and flavor.\n3. Taste: A good apple should have a balanced and pleasant taste, with a'

However, using with outlines returns text without spaces

from outlines import generate, models

model = models.mlxlm("mlx-community/Phi-3-mini-4k-instruct-8bit")
generator = generate.text(model)
response = generator("What is a good apple?")
print(response)

"\nAgoodapplecanvarybasedonpersonaltastepreferences,butherearesomecommoncharacteristicstoconsiderwhenchoosingafreshandtastyapple:\n\n1.Texture:Agoodappleshouldhaveafirmtexture,withnooverlysoftspotsorsignsofdecay.Theskinshouldnotbemushyordiscolored,andtheappleshouldbouncebackwhengentlypressedwithyourfingertip.\n\n2.Color:Aripeapplewillusuallyhaveaconsistentandvibrantcolor,dependingonitsvariety.Forexample,aRedDeliciousappleshouldhaveabrightredhue,whileaGrannySmithapplewillbeabrightgreenoradarker,moregreenish-yellowcolor.\n\n3.Smell:Agood,ripeapplewillhaveasweet,captivatingfragrance.Iftheapplesmellssourorunpleasant,itmightnotbefresh.\n\n4.Taste:Whiletastecanbesubjective,agoodapplehasapleasantandbalancedflavor,typicallycenteredonamixofsweetnessandabitofacidity.Agoodappleshouldtasterefreshingandjuicy.\n\n5.Variety:Differentapplevarietiesaresuitedtodifferentpurposes,suchaseatingraw,cooking,ormakingjuice.Somepopulartastyapplevarietiesinclude:\n\n -Honeycrisp:Knownforitscrisptextureandbalanceofsweetandtartflavors.\n -Fuji:Sweetandintenselyflavorful,greatforbotheatingrawandcooking.\n -Gala:Mildlysweetwitharomatichintsofpear,makingitgreatforsnacking.\n -GrannySmith:Tangyandfirm,idealforbaking,preserving,oraddingcrispnesstosaladsanddesserts.\n -RedDelicious:Knownforitsdeepredcolor,butitstastehasbeendebated.Somepeopleenjoyitsmildsweetness,whileothersmayfinditlessflavorful.\n\nRememberthattastecandifferfrompersontoperson,soit'salwayswisetosamplevariousappletypestofindwhichroundsoffyourpersonalpreferences.Inadditiontothecharacteristicsmentionedabove,herearesomeadditionalfactorstoconsiderwhenchoosingagoodapple:\n\n6.Storingandfreshness:Anapplethatwasfreshlyharvestedtendstobejuicierandfullofflavor.Whenshopping,keepaneyeoutforapplevarietiesthatareinseason,asthisincreasesthechancesofgettingamoreflavorfulapple.\n\n7.Organicorlocallysourced:Considerchoosingorganicapples,grownwithoutsyntheticpesticidesorfertilizers,ifyouwanttominimizeyourexposuretochemicals.Locallysourcedapplescanbefresherandsupportlocalfarmers.\n\n8.Preparingandhandling:Handleapplesgently,astheytendtobruiseeasily.Applesstoredintherefrigerator'scrisperdrawerwilllastlongerandpreservetheiroptimumfreshness.Alwayswashapplesbeforeconsumingthem.\n\nUltimately,findingagoodappledependsonindividualtastepreferences,sotryingdifferentvarietiesandconsideringthesefactorswillhelpyoufindtheperfectappleforyou."

Steps/code to reproduce the bug:

from outlines import generate, models
from enum import StrEnum
from Pydantic import BaseModel, Field
import json
import textwrap

model = models.mlxlm("mlx-community/Phi-3-mini-4k-instruct-8bit")

FRUITS = [
    "Red delicious apples",
    "Purple juicy grapes",
    "Orange sweet tangerines",
    "Yellow ripe bananas",
]

FruitEnum = StrEnum("FruitEnum", FRUITS)

class Fruit(BaseModel):
    """Return fruit description."""

    fruits: List[FruitEnum] = Field(
        description="A list of appetizing fruits",
        min_length=1,
    )

generator = generate.json(
    model,
    Fruit,
    # whitespace_pattern=r"[\n\t ]*",  # None, r"[ ]", r"[ ]*", r"[\n\t ]", r"[\n\t ]*"
)  # only call once per schema, not per-generation

# heal invalid json
invalid_json = """{fruits: ["bananas", "apples", "oranges"]}"""
result = generator(
    textwrap.dedent(
        f"""
Fix this JSON by enforcing the following schema:

{json.dumps(Fruit.model_json_schema())}

---

'{invalid_json}'
"""
    ).strip()
)
print(json.dumps(json.loads(result.json()), indent=2))

Expected result:

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Phi-3-mini-4k-instruct-8bit")
response = generate(model, tokenizer, prompt="What is a good apple?", verbose=False)
print(response)

`'\n<|assistant|> A good apple can vary depending on personal preferences, but generally, a good apple is one that is:\n\n1. Firm: A good apple should have a firm texture, which indicates that it is fresh and not overripe.\n2. Colorful: A good apple should have a vibrant and consistent color, which can indicate ripeness and flavor.\n3. Taste: A good apple should have a balanced and pleasant taste, with a'`

Error message:

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File /Users/alex.graber/_code/is-pmiaivolution-datascienceanalytics/notebooks/infinity-client/structured_generation.py:4
      1 # %%
      2 # heal invalid json
      3 invalid_json = """{fruits: ["bananas", "apples", "oranges"]}"""
----> 4 result = generator(
      5     textwrap.dedent(
      6         f"""
      7 Fix this JSON by enforcing the following schema:
      8 
      9 
     10 {json.dumps(Fruit.model_json_schema())}
     11 
     12 
     13 ---
     14 
     15 '{invalid_json}'
     16 """
     17     ).strip()
     18 )
     19 print(json.dumps(json.loads(result.json()), indent=2))

File ~/micromamba/envs/infinity/lib/python3.11/site-packages/outlines/generate/api.py:511, in SequenceGeneratorAdapter.__call__(self, prompts, max_tokens, stop_at, seed, **model_specific_params)
    499 generation_params = self.prepare_generation_parameters(
    500     max_tokens, stop_at, seed
    501 )
    503 completions = self.model.generate(
    504     prompts,
    505     generation_params,
   (...)
    508     **model_specific_params,
    509 )
--> 511 return format(completions)

File ~/micromamba/envs/infinity/lib/python3.11/site-packages/outlines/generate/api.py:497, in SequenceGeneratorAdapter.__call__.<locals>.format(sequences)
    495     return [format(sequence) for sequence in sequences]
...
  Input should be 'red delicious apples', 'purple juicy grapes', 'orange sweet tangerines' or 'yellow ripe bananas' [type=enum, input_value='yellowripebananas', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/enum
fruits.2
  Input should be 'red delicious apples', 'purple juicy grapes', 'orange sweet tangerines' or 'yellow ripe bananas' [type=enum, input_value='purplejuicygrapes', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/enum
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Outlines/Python version information:

Version information

``` (command output here) ```

Context for the issue:

I want to use the smallest model possible with structured generation to "heal" invalid JSON. Phi3-mini-4k generally fits the bill, except that I get validation errors due to the lack of spaces in the generated response when using the MLX model variant.

lapp0 commented 4 months ago

Thanks for reporting.

Preliminary: it looks like the tokenizer is behaving incorrectly when using phi-3.

I'll work on a fix.