Dan-wanna-M / formatron

Formatron empowers everyone to control the format of language models' output with minimal overhead.
MIT License
163 stars 6 forks source link

Not working under exllamav2 integration #23

Closed saturosfz closed 3 weeks ago

saturosfz commented 3 weeks ago

Version: formatron 0.4.7 exllamav2 0.2.3 Problem: output incomplete json Code:

from exllamav2 import ExLlamaV2, ExLlamaV2Config, ExLlamaV2Cache, ExLlamaV2Tokenizer
from exllamav2.generator import ExLlamaV2DynamicGenerator
from formatron.integrations.exllamav2 import create_formatter_filter
from formatron.formatter import FormatterBuilder
from formatron.schemas.pydantic import ClassSchema

model_dir = "/data/models/Qwen2.5-14B-Instruct-8.0bpw-h8-exl2"
config = ExLlamaV2Config(model_dir)
config.arch_compat_overrides()
model = ExLlamaV2(config)
cache = ExLlamaV2Cache(model, max_seq_len = 4096, lazy = True)
model.load_autosplit(cache, progress = True)

print("Loading tokenizer...")
tokenizer = ExLlamaV2Tokenizer(config)

# Initialize the generator with all default parameters
generator = ExLlamaV2DynamicGenerator(
    model = model,
    cache = cache,
    tokenizer = tokenizer,
)

class Superhero(ClassSchema):
    name: str
    secret_identity: str

f = FormatterBuilder()
f.append_line(f"{f.json(Superhero, capture_name='output')}")
lmfilter = create_formatter_filter(model, tokenizer, f)

output = generator.generate(
    prompt = "Here is some information about Superman:\n\n",
    filters = [lmfilter],
    filter_prefer_eos = True,
    max_new_tokens = 300,
    add_bos = True,
    stop_conditions = [tokenizer.eos_token_id],
    completion_only = True
)

print(output)
saturosfz commented 3 weeks ago

worked under exllamav2 dev branch

Dan-wanna-M commented 3 weeks ago

For people encountering the same issue in exllamav2 v0.2.3, if filter_prefer_eos=False then it will work as intended as well. Currently Formatron treats eos_token as a normal token so filter_prefer_eos=True will always sample eos_token once it is allowed.