Closed isamu-isozaki closed 6 months ago
I think I'll pull from https://github.com/outlines-dev/outlines/pull/781 which will probably solve 1 and 3
thanks, looking good so far... its nice that outlines already supports exl2
@edk208 Some notes
So in summary I think these are all the changes that can work from the main branch of outlines so far. Happy to get feedback!
I'll do the streaming idea tonight
what do you mean by the "logic of first doing preprocess and then generating tokens"? do you mean the first model.forward with preprocess_only = True?
@edk208 sry for the confusion and yes. To my understanding, the process is
I think step 1 is technically not possible in outlines but steps 2 and 3 might be possible in the above pr. Let me try it tomorrow
@isamu-isozaki yes that's correct. The preprocess runs the prompts through and sets up the KV cache, then you can round-robin through them and generate one token at a time. Interesting that outlines doesn't like step 1. I would imagine it would have to do that anyway. I can take a look too in the next few days.
Hi! I think the main logic is done. For the test I used config.ini
[settings]
host = 127.0.0.1
port = 12345
upload_url = https://url/api/upload
path_url = https://url/folder/
[phi3b]
string = phi3b
repo = ..../Phi-3-mini-128k-instruct-exl2
with the model from here and I started the server with
python llm_exl2_client_multi.py --port=5000 --use_outlines --gpu_split="5" --max_context=512 --repo_str=phi3b
Then on the client side, I did
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
AIMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage, SystemMessage, AIMessage
import json
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(temperature=1.0,
openai_api_base="http://localhost:5000/v1",
openai_api_key="Test",
streaming=True,
max_tokens=1024)
messages = [
SystemMessage(
content="You are a helpful assistant."
),
HumanMessage(
content="Who is more impressive? Bob or Fred?"
)
]
choices = ["Bob", "Fred"]
for chunk in llm.stream(messages, extra_body={"stop_at":"done", "outlines_type": "choices", "choices": choices}):
print(chunk.content, end="", flush=True)
which got me Bob. I can do more tests if you want but I think it's working. One main logic here is that for adding new parameters to the open ai API we use extra_body rather than function calling/tool calling since I couldn't think of an easy way to translate it.
This is a draft PR. Currently, the 3 main parts left to do to make this work is