[JS][Proposal] Streamlined Generation APIs

mbleigh commented 3 hours ago

This is a proposed breaking change API for Genkit to streamline the most common scenarios while keeping the flexibility and capability level constant. The changes can be broken down into three components:

Encouraging default model configurations
Streamlining generation to return data directly instead of returning a wrapping response
Separating out multi-turn and single-turn use cases

Default Model Configurations

While one of the strengths of Genkit is the ability to easily swap between multiple models, we find in practice that most people use a single model as their "go-to" with other models swapped in as needed. The same goes for model configuration -- most of the time you're going to want the same settings.

Proposed is to encourage setting a default model (now just called model) when initializing Genkit as well as the ability to define model settings when instantiating a reference to a model.

import { genkit } from "genkit";
import { vertexAI } from "@genkit-ai/vertexAI";

const ai = genkit({
  plugins: [vertexAI()],
  // sets a default model with configuration
  model: vertexAI.geminiModel('gemini-1.5-flash', {safetySettings: [...]});
});

const claude = vertexAI.anthropicModel('claude-3.5-sonnet', {...claudeSettings});

Both model and configuration can still be overridden at call time, but this makes it easier to set a common reusable baseline.

Streamlining Generation

Most of the time, what you want from a generate() call is the data that is being generated. Today this requires a two-line "get response, get output from response" pattern which gets tedious when working with e.g. multi-step processes.

Proposed is to simplify to a generate API that will return text or structured data depending on call configuration:

const jokeText = await ai.generate("Tell a funny joke.");

const fakePerson = await ai.generate({
  prompt: "Generate the information for an imaginary person named Annaka",
  schema: z.object({name: z.string(), job: z.string(), hobbies: z.array(z.string())}),
});

This can get more complex if you want it to:

const jokeAdvanced = await ai.generate({
  model: gpt,
  config: {...},
  prompt: {role: "user", content: [{text: "Tell a funny joke."}],
});

When developers do want to dig into the metadata of the response, they can use a new generateResponse method which will be equivalent to generate today.

const jokeResponse = await ai.generateResponse("Tell a funny joke.");
console.log(jokeResponse.text());
console.log(jokeResponse.stopReason);

Streaming will be supported through streamGenerate and streamGenerateResponse. When doing streamGenerate, the chunks emitted will be in output form (either a partial data response or a string chunk):

const {stream, data} = ai.streamGenerate("Tell a really long joke with at least 5 paragraphs.");

for await (const chunk of stream) {
  console.log(chunk); // chunk is just a string
}

console.log(await data); // this is the full result, equiavalent to `generate()`

const {stream, response} = ai.streamGenerateResponse(...);
for await (const chunk of stream) {
  console.log(chunk.text()); // chunk is a Chunk instance
}

console.log((await response).usage);

Multi-Turn Generation

All of the above is great if you only have a single turn generation, but it doesn't really help for a chatbot scenario. Fundamentally multi-turn use cases are pretty different and deserve better attention in the API surface.

Proposed is a new Chat class and a new send() method that lets you explicitly opt-in to multi-turn conversational use cases.

const chat = ai.chat({
  system: "You are a pirate.",
});

const response = await chat.send("How are you today?");
console.log(reply);
// "Yarr, not too bad, matey. How be ye?"
const {stream, data} = await chat.streamSend("Tell me a long story, ye scurvy sea dog!");
chat.messages(); // equivalent to `toHistory()` in current Genkit

chrisraygill commented 2 hours ago

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

ai.streamGenerate() --> ai.generateStream() ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

mbleigh commented 2 hours ago

I generally like it - but a few questions:

1. Streaming with generateX

streamGenerate() today is generateStream(). Do to mean to change that, or just an oversight? Otherwise:

Hmm, mostly accidental but maybe intentional after some thought. The problem is that generateStream makes sense but sendStream sounds like you're sending the stream, not receiving one back.

ai.streamGenerate() --> ai.generateStream() ai.stremGenerateResponse() --> ai.generateResponseStream()

2. Streaming for multi-turn generation

How do you get a streamed response from ai.send()?

If we're being consistent with generate(), then it would be ai.sendStream() .

Yeah, forgot to write that up, ai.streamSend would be the proposal.

3. Arguments for ai.send

Does ai.send accept the same arguments as ai.generateResponse()? Does it return the same response object?

If so, what's the difference between the two?

I'm imagining them as being two things, but they're really really similar so it's maybe a judgment call if they deserve to be different things. I'm imagining generateResponse returns a GenerateResponse which does not necessarily have send() on it.

But maybe...maybe they are just the same thing, and the extra "stuff you want to do with the response" of send() means that it's also sufficient for "single-turn but want more metadata".

I like the idea of calling this a Conversation, but in theory it could maybe replace GenerateResponse? Hmm...

firebase / genkit