explainers-by-googlers / prompt-api

A proposal for a web API for prompting browser-provided language models
Creative Commons Attribution 4.0 International
266 stars 20 forks source link

API Shape, prompt/promptStreaming should use AIAssistantPrompt rather than DOMString input #28

Closed sushraja-msft closed 1 month ago

sushraja-msft commented 3 months ago

The prompt API feels inconsistent because the create option AIAssistantCreateOptions takes AIAssistantPrompt, while the AIAssistant interface takes a DOMString as input.

dictionary AIAssistantCreateOptions {
  AbortSignal signal;
  AICreateMonitorCallback monitor;

  DOMString systemPrompt;
  sequence<AIAssistantPrompt> initialPrompts;
  [EnforceRange] unsigned long topK;
  float temperature;
};

dictionary AIAssistantPrompt {
  AIAssistantPromptRole role;
  DOMString content;
};
interface AIAssistant : EventTarget {
  Promise<DOMString> prompt(DOMString input, optional AIAssistantPromptOptions options = {});
  ReadableStream promptStreaming(DOMString input, optional AIAssistantPromptOptions options = {});
}

Practically this means that the prompt / promptStreaming methods assume that the new input is necessarily for the user role.

This limits the API, in that when function calling or multiple agents want to add a response to the conversation, they cannot breakout of the user role.

It would be better to have

  Promise<DOMString> prompt(AIAssistantPrompt input, optional AIAssistantPromptOptions options = {});
  ReadableStream promptStreaming(AIAssistantPrompt input, optional AIAssistantPromptOptions options = {});

This would allow tool call responses to be added to the chat as an assistant role. Supporting multiple agents is still not possible, but this can be managed with an AIAssistantPrompt such as

<assistant> response from ratingAgent: This response looks appropriate </assistant>

for a hypothetical use case where the previous assistant message was a request for a rating agent to review a message.

domenic commented 3 months ago

I think this makes sense. Looking at various API documentation and playing with some cases, although the examples all assume an alternating user/assistant role, the format doesn't seem to require that.

I'll add overloads for adding multiple messages at once, and keep the simple string version for user.

domenic commented 3 months ago

Additionally, I note that some APIs have "tool" and "function" as a role. We should consider adding that if we want to make tool-use a fully supported part of the API. But there are more considerations there, so I'll leave them out for now, and people can use hacks as you suggest.

Yvem commented 2 months ago

I agree that there is a surprise here. I noticed the AIAssistantPromptRole = 'system' | 'user' | 'assistant' in the initialPrompts output property and wondered where it comes from since it's not in the input. Clarification welcome!