acsresearch / interlab

MIT License
17 stars 2 forks source link

Look into LangChain queries and general integration #1

Closed gavento closed 1 year ago

gavento commented 1 year ago

Rationale:

  1. Our focus has shifted from "primarily structured queries" to "primarily agent interactions"
  2. LangChain seems to have improved a lot in the last ~3 months (and is also more widely used, has a ton of integrations etc.) (Tomáš: I just updated on this a few days ago)
  3. QueryChains does not have a great usability for parsing response as data (XML-like tags have sort of low ceiling, and e.g. we still do not have standard formatting prompt text), and it would be a lot of work to add/improve it now.

We may want to just use LangChain for what it seems good at:

And use&focus querychains (to be renamed) on:

gavento commented 1 year ago

Notes: Parsing LLM output seems still not entirely solved, not sure what most people use, seems to be wild (some even use "on the last line write X") but JSON (+schema +examples) seems to be one of the best ones.

Other examples of structured output include Guidance from MS, although it performs much better with models where you a) see the logits and b) can control the generation flow (inserting fixed tokens etc.). Likely not useful for us (yet) but sort of on the radar.

gavento commented 1 year ago

tl;dr: Works great with GPT-4 (every time), davinci-003, GPT-3.5, Claude and Falcon-40B-instruct (may require retries for complex JSON schemas). 13B param models mostly fail (curie) but may depend on fine-tuning for JSON (e.g. may vary across open-source models). Running larger OSS models (falcon-40b) still technically tricky (HF endpoints failed, or are slow).

Setup

Did some experiments with Pydantic JSON prompting and parsing using this helper function

def query_json(llm, type, prompt, **vars):
    if isinstance(prompt, str):
        prompt=PromptTemplate.from_template(prompt)
    assert "format_instructions" in prompt.input_variables
    parser = PydanticOutputParser(pydantic_object=type)
    input = prompt.format_prompt(format_instructions=parser.get_format_instructions(), **vars)
    if isinstance(llm, langchain.chat_models.base.BaseChatModel):
        output = llm(input.to_messages()).content
    else:
        output = llm(input.to_string())
    print(output)
    return parser.parse(output)

and Pydantic classes Joke and Nobelists:

class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

class Gender(enum.Enum):
    m = 'M'
    f = 'F'
    other = 'other'

class Nobelist(BaseModel):
    year: int = Field(description="year of the award")
    # date: datetime.datetime # NB: datetime.datetime here fails to parse e.g. 2020-10-05
    date: datetime.date = Field(description="date of the award as YYYY-MM-DD") # This is more robust
    first_name: str = Field(description="first name of the nobelist")
    last_name: str = Field(description="last name of the nobelist")
    fields: List[str] = Field(description="major research fields the nobelist worked in")
    gender: Gender

class Nobelists(BaseModel):
    awards: List[Nobelist] = Field(description="list of awarded Nobel prizes")

Results

Could be likely better with repeated query (whether a simple repeat-on-failure, or auto-fixing, or retry-parser)

(Model size estimates from here)

gavento commented 1 year ago

Resolved by #7 (cb48ab63cb60ae6be485835dcc25aab2bb0022ea)