Open aowen14 opened 3 months ago
Thank you, happy to review a PR!
outlines.serve
should support json https://outlines-dev.github.io/outlines/reference/serve/vllm/#querying-endpoint
Additionally, outlines.models.vllm
supports json as well. Could you please clarify the issue you ran into when trying this?
@rlouf I would be happy to create a PR for the docker setup, first I want to fully answer @lapp0's question, as this might be important for why I would like to use accelerate. I would prefer to use vLLM.
I created a simple pydantic use case for vllm, transformers and serve, this is what the code was for each, and what was output. Since running these, I added params to the vLLM example and it started returning valid json, but I was expecting it to work out of the box as vLLM has default parameters + outlines should be restricting to correct schema(?)
import requests
from pydantic import BaseModel
# Define the Book model
class Book(BaseModel):
title: str
author: str
year: int
# Define the request parameters
ip_address = "localhost"
port = "8000"
prompt = "Create a book entry with the fields title, author, and year"
schema = Book.model_json_schema()
# Create the request body
outlines_request = {
"prompt": prompt,
"schema": schema
}
print("Prompt: ", prompt)
# Make the API call
response = requests.post(f"http://{ip_address}:{port}/generate/", json=outlines_request)
# Check if the request was successful
if response.status_code == 200:
result = response.json()
print("Result:", result["text"])
else:
print(f"Error: {response.status_code}, {response.text}")
Server command:
python -m outlines.serve.serve --model="microsoft/Phi-3-mini-128k-instruct" --max-model-len 5000
Output:
Prompt: Create a book entry with the fields title, author, and year
Result: ['Create a book entry with the fields title, author, and year{ "title": "The Great Gatsby", "author": "F']
from outlines import models, generate
from pydantic import BaseModel
from vllm import SamplingParams
class Book(BaseModel):
title: str
author: str
year: int
print("\n\npydantic_vllm_example\n\n")
model = models.vllm("microsoft/Phi-3-mini-128k-instruct", max_model_len= 25000)
params = SamplingParams(temperature=0, top_k=-1)
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)
Output:
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1160, in parse_raw
[rank0]: obj = parse.load_str_bytes(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/deprecated/parse.py", line 49, in load_str_bytes
[rank0]: return json_loads(b) # type: ignore
[rank0]: ^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/__init__.py", line 346, in loads
[rank0]: return _default_decoder.decode(s)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 337, in decode
[rank0]: obj, end = self.raw_decode(s, idx=_w(s, 0).end())
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/json/decoder.py", line 353, in raw_decode
[rank0]: obj, end = self.scan_once(s, idx)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^
[rank0]: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 42 (char 41)
[rank0]: During handling of the above exception, another exception occurred:
[rank0]: Traceback (most recent call last):
[rank0]: File "<frozen runpy>", line 198, in _run_module_as_main
[rank0]: File "<frozen runpy>", line 88, in _run_code
[rank0]: File "/home/lambda1/AlexCode/Performance-Benchmarking/outlines_local_vllm.py", line 27, in <module>
[rank0]: result = generator(prompt, sampling_params=params)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 511, in __call__
[rank0]: return format(completions)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/api.py", line 497, in format
[rank0]: return self.format_sequence(sequences)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/outlines/generate/json.py", line 50, in <lambda>
[rank0]: generator.format_sequence = lambda x: schema_object.parse_raw(x)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/lambda1/miniconda3/envs/outlines/lib/python3.11/site-packages/pydantic/main.py", line 1187, in parse_raw
[rank0]: raise pydantic_core.ValidationError.from_exception_data(cls.__name__, [error])
[rank0]: pydantic_core._pydantic_core.ValidationError: 1 validation error for Book
[rank0]: __root__
[rank0]: Unterminated string starting at: line 1 column 42 (char 41) [type=value_error.jsondecode, input_value='{ "title": "The Great Gatsby", "author": "F', input_type=str]
from outlines import models, generate
from pydantic import BaseModel
from outlines.samplers import greedy
class Book(BaseModel):
title: str
author: str
year: int
model = models.transformers("microsoft/Phi-3-mini-128k-instruct", device="cuda:0")
print("\n\npydantic_transformers_example\n\n")
generator = generate.json(model, Book)
prompt = "Create a book entry with the fields title, author, and year"
result = generator(prompt)
print("Prompt:",prompt)
print("Result:",result)
Output:
Prompt: Create a book entry with the fields title, author, and year
Result: title='Invisible Cities' author='Italo Calvino' year=1974
I'll look into the bug with json handling in vLLM.
Hi, just checking in here. Any updates on when this/the relevant PR's might be finished? Mainly asking as it affects a content schedule where we would be talking about outlines. Thanks!
Have you tried using vLLM's structured output feature in their OpenAI-compatible API? They use outlines under the hood.
I plan on getting there at some point soon but was waiting on this. I don't view using Outlines and Outlines via vLLM as mutually exclusive for our purposes as we were looking to make pieces about both :). I was thinking the original outlines post would be a good intro for both of them.
Also, I saw the release of Outlines-core, which could be another cool thing to put into the post as well.
I'm happy to go down the path of vLLM for this in the meantime!
Happy to review a PR that adds accelerate
to the image!
Describe the issue as clearly as possible:
Not sure if this is a bug or a feature request. But
accelerate
apparently isn't installed in the docker image. Which means that one can either use transformers with no GPU acceleration, or vLLM. vLLM currently doesn't have feature parity with transformers from what I can tell (likegenerate.json()
).Running the code outside of the image with the library +
accelerate
works.pip install accelerate
in the container works to solve the issue as well, and it seems like the marginal download was very small.Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
Docker Image Version Hash:
98c8512bd46f
Version information
Context for the issue:
I'm trying to write a post on using Outlines with Vast, and Vast needs everything to be based in a docker container to run, it would be great if users could start their workloads in the container without needing to install accelerate first.