explainers-by-googlers / prompt-api

A proposal for a web API for prompting browser-provided language models
Creative Commons Attribution 4.0 International
194 stars 12 forks source link

Support for guidance/structured output with prompt API #35

Open sushraja-msft opened 3 weeks ago

sushraja-msft commented 3 weeks ago

To aid programmability, reduce compatibility risk from the API returning different results across browser, avoid challenges in updating a shipping model in the browser (Google Model V1 to Google Model V2), please consider adding techniques like guidance, structured outputs as an integral part of the prompt API.

Problem Illustration

Consider the following web developer scenarios, where a developer is:

  1. Classifying product review as the user types, to ask follow-up questions.
  2. Building a chat bot and would like to programmatically detect if a question should be routed a particular way.
  3. Building a reading comprehension assistive extension, that poses questions based on the web page content.

    image

Web developers who attempt to parse the response are going to have a hard time writing code that is model/browser agnostic.

Constraining Output

One way to solve this problem is to use guidance or techniques like it. At a high level these techniques work by restricting the next allowed token from the LLM to conform to a grammar. Guidance works on top of a model, is model agnostic and only changes logits from the last layer of a model before sampling. There is an additional implementation detail within guidance in that information about all possible tokens prefixed with the next possible token is required for it to function (explanation).

With guidance (demo) we get better consistency across models and responses that are immediately parseable with JavaScript.

image

Proposal

The proposal is to add responseJsonSchema to the AIAssistantPromptOptions.

dictionary AIAssistantPromptOptions { AbortSignal signal; DomString? responseJsonSchema; };

JSON schema is familiar to web developers. However, JSON schema is a super set of what techniques like guidance can achieve today. For example, parts of the schema to enforce JSON schema constraints like dependent required cannot be enforced. Either the API can state that only Property Name, Value Type, Enum, Arrays would be enforced, or Prompt API should validate the response with a JSON schema validator and indicate that the response is non conformant. Slight preference to the first option because of its simplicity.

Other Approaches

domenic commented 2 weeks ago

In general we're excited about exploring this. Minor API surface nitpicks:

So to summarize: object responseJSONSchema in the dictionary.