dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
8.27k stars 422 forks source link

Build generation pipelines from YAML config files #604

Open 7flash opened 7 months ago

7flash commented 7 months ago

Presentation of the new feature

Currently I have implemented following CLI tool which is using outlines:

#!/usr/bin/env python3

import argparse
import outlines
import outlines.models as models
import os

def main(prompt_path, model_name):
    prompt_path = os.path.expanduser(prompt_path)
    with open(prompt_path, 'r') as file:
        prompt = file.read().strip()

    generator = models.openai_compatible_api(model_name, base_url="", api_key="sk-xx", encoding="gpt-4")

    answer = generator(prompt)

    out_file_path = os.path.join(os.path.dirname(prompt_path), os.path.basename(prompt_path).replace("-in.txt", "-out.txt"))

    with open(out_file_path, 'w') as out_file:
        out_file.write(answer)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Generate response from a prompt file using OpenAI's model and write the response to a new file.")
    parser.add_argument("prompt_path", type=str, help="Path to the file containing the prompt text.")
    parser.add_argument("--model", type=str, default="gpt-4-0125-preview", help="Model name to be used. Default is 'gpt-4-0125-preview'.")

    args = parser.parse_args()

    main(args.prompt_path, args.model)

It's reading prompt from given file path and writes completion into another output file.

I've found it extremely convenient editing my prompt with Helix editor and then viewing response in Helix again.

Therefore I added following workflow in Warp terminal.

xit="$(date +"%b%d-%H%M")"
pit="Documents/feb2-prompts"
ipixit="$HOME/$pit/$xit-in.txt"
opixit="$HOME/$pit/$xit-out.txt"
zit="$HOME/Documents/Feb02-0833.py"

hx $ipixit && python3 $zit $ipixit && hx $opixit

It opens a new file where I write prompt, close it with :wq and then CLI tool creates completion on it, and opens output file in helix again.

This way I found to be more clean and productive than any other existing UI.

But what I'm missing, of course, other capabilities of outlines, such as json and regex. These can be embedded into prompt file with yaml frontmatter, which is a block of key-value pairs prepending actual file content.

Example 1. regexp

- regexp: \d+
---
Calculate two plus two

Example 2. json

schema:
  - class: string
  - level: number
---
Generate a character

Example 3. template

variables:
  task: first
  subject: second 
---
Execute {task} over {subject}

Example 4. choice, etc..

Note: first two examples are supposed to be working with local models, third example compatible with my original script.

rlouf commented 7 months ago

So if I understand well you would like your first example to be the equivalent of:

from outlines import generate
from pydantic import BaseModel

def generation(model):
    prompt = "Generate a character"

    class Character(BaseModel):
        class: str
        level: float

    generator = generate.json(model, Character)
    return generator(prompt)

I am not a huge fan of using configuration files instead of code. Not only do you need to parse them, but they also are untestable, and do not generalize well to complex use cases. I am however open to a code construct that would allow you to abstract and simplify this kind of workflow; outlines functions were designed with this in mind and we can think of ways to extend them.