firebase / genkit

An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to integrate, test, and deploy sophisticated AI features to Firebase or Google Cloud.
Apache License 2.0
595 stars 73 forks source link

[Bug] gemini-1.5-pro-latest returns error when combining tools and output schema #703

Open lazakrisz opened 1 month ago

lazakrisz commented 1 month ago

Describe the bug When using Gemini 1.5 Pro with tool calling, (function calling ) and specifying output schema within generate, the generate call returns an error: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent: [400 Bad Request] Function calling with a response mime type: 'application/json' is unsupported. What's interesting is upon removal of the output.schema property the generate function does output valid JSON. (see screenshot below)

image

and here is the topmost part of the response:

image

for clarity here is the entire response / logs cycle:

[1] >  Request[generateQuizQuestions] {
[1] >    flowName: 'generateQuizQuestions',
[1] >    headers: {
[1] >      host: '127.0.0.1:5001',
[1] >      connection: 'keep-alive',
[1] >      'content-length': '42',
[1] >      pragma: 'no-cache',
[1] >      'cache-control': 'no-cache',
[1] >      'sec-ch-ua': '"Not)A;Brand";v="99", "Google Chrome";v="127", "Chromium";v="127"',
[1] >      'sec-ch-ua-platform': '"Android"',
[1] >      'sec-ch-ua-mobile': '?1',
[1] >      authorization: '<redacted>',
[1] >      'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
[1] >      'content-type': 'application/json',
[1] >      accept: '*/*',
[1] >      origin: 'http://localhost:3000',
[1] >      'sec-fetch-site': 'cross-site',
[1] >      'sec-fetch-mode': 'cors',
[1] >      'sec-fetch-dest': 'empty',
[1] >      referer: 'http://localhost:3000/',
[1] >      'accept-encoding': 'gzip, deflate, br, zstd',
[1] >      'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8'
[1] >    },
[1] >    params: { '0': '' },
[1] >    body: { data: { quizId: '0qfxuNZiTgBsQ1ZXYpQ5' } },
[1] >    query: {},
[1] >    originalUrl: '/',
[1] >    path: '',
[1] >    qualifiedPath: '',
[1] >    source: 'ts',
[1] >    sourceVersion: '0.5.8'
[1] >  }
[1] >  Initializing plugin googleai:
[1] >  Registering model: googleai/gemini-pro
[1] >  Registering model: googleai/gemini-pro-vision
[1] >  Registering model: googleai/gemini-1.5-pro-latest
[1] >  Registering model: googleai/gemini-1.5-flash-latest
[1] >  Registering embedder: googleai/embedding-001
[1] >  Config[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest] {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    temperature: undefined,
[1] >    topK: undefined,
[1] >    topP: undefined,
[1] >    maxOutputTokens: undefined,
[1] >    stopSequences: undefined,
[1] >    source: 'ts',
[1] >    sourceVersion: '0.5.8'
[1] >  }
[1] >  Input[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest]  {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: '\n' +
[1] >      '        Respond as JSON only.\n' +
[1] >      '        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.\n' +
[1] >      "        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.\n" +
[1] >      '        You should take the following criteria in mind:\n' +
[1] >      '          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly\n' +
[1] >      '          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions\n' +
[1] >      '          * you may also generate more than one potential answers to a given question\n' +
[1] >      '        \n' +
[1] >      '        QuizId: 0qfxuNZiTgBsQ1ZXYpQ5\n' +
[1] >      '      ',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    messageIndex: 0,
[1] >    totalMessages: 1
[1] >  }
[1] >  Output[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest]  {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: 'Tool request: quizTool, ref: undefined, input: {"quizId":"0qfxuNZiTgBsQ1ZXYpQ5"}',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    candidateIndex: 0,
[1] >    totalCandidates: 1,
[1] >    messageIndex: 0,
[1] >    finishReason: 'stop'
[1] >  }
[1] >  Config[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest] {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    temperature: undefined,
[1] >    topK: undefined,
[1] >    topP: undefined,
[1] >    maxOutputTokens: undefined,
[1] >    stopSequences: undefined,
[1] >    source: 'ts',
[1] >    sourceVersion: '0.5.8'
[1] >  }
[1] >  Input[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest] (message 0 of 3) {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: '\n' +
[1] >      '        Respond as JSON only.\n' +
[1] >      '        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.\n' +
[1] >      "        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.\n" +
[1] >      '        You should take the following criteria in mind:\n' +
[1] >      '          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly\n' +
[1] >      '          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions\n' +
[1] >      '          * you may also generate more than one potential answers to a given question\n' +
[1] >      '        \n' +
[1] >      '        QuizId: 0qfxuNZiTgBsQ1ZXYpQ5\n' +
[1] >      '      ',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    messageIndex: 0,
[1] >    totalMessages: 3
[1] >  }
[1] >  Input[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest] (message 1 of 3) {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: 'Tool request: quizTool, ref: undefined, input: {"quizId":"0qfxuNZiTgBsQ1ZXYpQ5"}',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    messageIndex: 1,
[1] >    totalMessages: 3
[1] >  }
[1] >  Input[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest] (message 2 of 3) {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: 'Tool response: quizTool, ref: undefined, output: {"numberOfQuestions":16,"userQuestions":["User question test?","Another user question?"],"userTopics":["Geography","Animals"]}',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    messageIndex: 2,
[1] >    totalMessages: 3
[1] >  }
[1] >  Output[generateQuizQuestions > googleai/gemini-1.5-pro-latest, googleai/gemini-1.5-pro-latest]  {
[1] >    model: 'googleai/gemini-1.5-pro-latest',
[1] >    path: 'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}/{googleai/gemini-1.5-pro-latest,t:action}',
[1] >    flowName: 'generateQuizQuestions',
[1] >    content: '```json\n' +
[1] >      `[{"question": "What is the capital of France?", "options": ["Paris", "Berlin", "Madrid", "Rome"], "answer": "Paris"}, {"question": "What is the highest mountain in the world?", "options": ["Mount Everest", "K2", "Kangchenjunga", "Lhotse"], "answer": "Mount Everest"}, {"question": "What is the largest ocean in the world?", "options": ["Pacific Ocean", "Atlantic Ocean", "Indian Ocean", "Arctic Ocean"], "answer": "Pacific Ocean"}, {"question": "What is the smallest country in the world?", "options": ["Vatican City", "Monaco", "Nauru", "Tuvalu"], "answer": "Vatican City"}, {"question": "What is the largest country in the world?", "options": ["Russia", "Canada", "China", "United States"], "answer": "Russia"}, {"question": "What is the largest animal in the world?", "options": ["Blue Whale", "African Elephant", "Giraffe", "Hippopotamus"], "answer": "Blue Whale"}, {"question": "What is the fastest land animal in the world?", "options": ["Cheetah", "Lion", "Tiger", "Leopard"], "answer": "Cheetah"}, {"question": "What is the tallest animal in the world?", "options": ["Giraffe", "Elephant", "Hippopotamus", "Rhinoceros"], "answer": "Giraffe"}, {"question": "User question test?", "options": ["Answer 1", "Answer 2", "Answer 3"], "answer": null}, {"question": "Another user question?", "options": ["Answer 1", "Answer 2"], "answer": null}, {"question": "What animal is known as the 'King of the Jungle'?", "options": ["Lion", "Tiger", "Leopard", "Jaguar"], "answer": "Lion"}, {"question": "Which continent is known as the 'Dark Continent'?", "options": ["Africa", "Asia", "Europe", "North America"], "answer": "Africa"}, {"question": "What is the capital of Australia?", "options": ["Canberra", "Sydney", "Melbourne", "Brisbane"], "answer": "Canberra"}, {"question": "What is the largest desert in the world?", "options": ["Antarctic Polar Desert", "Arctic Polar Desert", "Sahara Desert", "Arabian Desert"], "answer": "Antarctic Polar Desert"}, {"question": "What is the longest river in the world?", "options": ["Nile River", "Amazon River", "Yangtze River", "Mississippi River"], "answer": "Nile River"}, {"question": "What is the smallest continent in the world?", "options": ["Australia", "Europe", "Antarctica", "South America"], "answer": "Australia"}]\n` +
[1] >      '```',
[1] >    partIndex: 0,
[1] >    totalParts: 1,
[1] >    candidateIndex: 0,
[1] >    totalCandidates: 1,
[1] >    messageIndex: 0,
[1] >    finishReason: 'stop'
[1] >  }
[1] >  output:  {
[1] >    output: [
[1] >      {
[1] >        question: 'What is the capital of France?',
[1] >        options: [Array],
[1] >        answer: 'Paris'
[1] >      },
[1] >      {
[1] >        question: 'What is the highest mountain in the world?',
[1] >        options: [Array],
[1] >        answer: 'Mount Everest'
[1] >      },
[1] >      {
[1] >        question: 'What is the largest ocean in the world?',
[1] >        options: [Array],
[1] >        answer: 'Pacific Ocean'
[1] >      },
[1] >      {
[1] >        question: 'What is the smallest country in the world?',
[1] >        options: [Array],
[1] >        answer: 'Vatican City'
[1] >      },
[1] >      {
[1] >        question: 'What is the largest country in the world?',
[1] >        options: [Array],
[1] >        answer: 'Russia'
[1] >      },
[1] >      {
[1] >        question: 'What is the largest animal in the world?',
[1] >        options: [Array],
[1] >        answer: 'Blue Whale'
[1] >      },
[1] >      {
[1] >        question: 'What is the fastest land animal in the world?',
[1] >        options: [Array],
[1] >        answer: 'Cheetah'
[1] >      },
[1] >      {
[1] >        question: 'What is the tallest animal in the world?',
[1] >        options: [Array],
[1] >        answer: 'Giraffe'
[1] >      },
[1] >      { question: 'User question test?', options: [Array], answer: null },
[1] >      {
[1] >        question: 'Another user question?',
[1] >        options: [Array],
[1] >        answer: null
[1] >      },
[1] >      {
[1] >        question: "What animal is known as the 'King of the Jungle'?",
[1] >        options: [Array],
[1] >        answer: 'Lion'
[1] >      },
[1] >      {
[1] >        question: "Which continent is known as the 'Dark Continent'?",
[1] >        options: [Array],
[1] >        answer: 'Africa'
[1] >      },
[1] >      {
[1] >        question: 'What is the capital of Australia?',
[1] >        options: [Array],
[1] >        answer: 'Canberra'
[1] >      },
[1] >      {
[1] >        question: 'What is the largest desert in the world?',
[1] >        options: [Array],
[1] >        answer: 'Antarctic Polar Desert'
[1] >      },
[1] >      {
[1] >        question: 'What is the longest river in the world?',
[1] >        options: [Array],
[1] >        answer: 'Nile River'
[1] >      },
[1] >      {
[1] >        question: 'What is the smallest continent in the world?',
[1] >        options: [Array],
[1] >        answer: 'Australia'
[1] >      }
[1] >    ],
[1] >    type: 'object'
[1] >  }
[1] >  Paths[generateQuizQuestions] {
[1] >    flowName: 'generateQuizQuestions',
[1] >    paths: [
[1] >      'generateQuizQuestions > googleai/gemini-1.5-pro-latest',
[1] >      'generateQuizQuestions > quizTool',
[1] >      'generateQuizQuestions > googleai/gemini-1.5-pro-latest'
[1] >    ]
[1] >  }
[1] >  Using DevFlowStateStore. Root: /var/folders/kq/p7xl1bfj73980g9ws4qpzl3c0000gn/T/.genkit/6d679589bfa70f0f24822e2a463d77e0/flows
[1] >  save flow state e0ec94dd-5147-4ca7-a27b-85acd969cced
[1] >  Error[generateQuizQuestions, TypeError] {
[1] >    path: 'generateQuizQuestions',
[1] >    qualifiedPath: '/{generateQuizQuestions,t:flow}',
[1] >    name: 'TypeError',
[1] >    message: 'output.questions is not iterable',
[1] >    stack: 'TypeError: output.questions is not iterable\n' +
[1] >      '    at /Users/krisztianlazar/work/ai-challenge/ai/apps/ai/lib/flows/generate-quiz-questions.js:79:35\n' +
[1] >      '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)',
[1] >    source: 'ts',
[1] >    sourceVersion: '0.5.8'
[1] >  }
[1] >  Error[, Error] {
[1] >    path: '',
[1] >    qualifiedPath: '',
[1] >    name: 'Error',
[1] >    message: 'output.questions is not iterable',
[1] >    stack: 'Error: output.questions is not iterable\n' +
[1] >      '    at Flow.<anonymous> (/Users/krisztianlazar/work/ai-challenge/node_modules/@genkit-ai/flow/lib/flow.js:493:19)\n' +
[1] >      '    at Generator.next (<anonymous>)\n' +
[1] >      '    at fulfilled (/Users/krisztianlazar/work/ai-challenge/node_modules/@genkit-ai/flow/lib/flow.js:36:24)\n' +
[1] >      '    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)',
[1] >    source: 'ts',
[1] >    sourceVersion: '0.5.8'
[1] >  }

I've also tried the vertex ai plugin but got a similar esoteric error message. Tried the following manual function invocation however generate once again throws an error:

import { generate } from "@genkit-ai/ai";
import { firebaseAuth } from "@genkit-ai/firebase/auth";
import { noAuth, onFlow } from "@genkit-ai/firebase/functions";
import { gemini15Pro } from "@genkit-ai/googleai";
import { QuizQuestions } from "@repo/db/src/firestore";
import { generateQuizQuestionsFlowSchema } from "@repo/schemas";
import { getApp } from "firebase-admin/app";
import { HttpsError } from "firebase-functions/v2/https";
import { getFirelord, getFirestore, writeBatch } from "firelord";
import * as z from "zod";
import { quizTool } from "../tools/quiz-tool";

export const generateQuizQuestions = onFlow(
  {
    name: "generateQuizQuestions",
    inputSchema: generateQuizQuestionsFlowSchema.inputSchema,
    outputSchema: generateQuizQuestionsFlowSchema.outputSchema,
    authPolicy: process.env.FUNCTIONS_EMULATOR
      ? noAuth()
      : firebaseAuth((user) => {
          if (!user || !user.isAdmin) {
            throw new HttpsError("unauthenticated", "User must be logged in!");
          }
        }),
  },
  async (input, streamingCallback) => {
    if (streamingCallback) {
      throw new HttpsError(
        "failed-precondition",
        "This flow cannot be invoked using streaming!",
      );
    }

    const llmResponse = await generate({
      model: gemini15Pro,
      prompt: `
        Respond as JSON only.
        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.
        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.
        You should take the following criteria in mind:
          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly
          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions
          * you may also generate more than one potential answers to a given question

        QuizId: ${input.quizId}
      `,
      tools: [quizTool],
      returnToolRequests: true,
    });

    const toolRequests = llmResponse.toolRequests();

    const toolResponses: ToolResponsePart[] = [];

    for (const toolRequest of toolRequests) {
      if (toolRequest.toolRequest.name === "quizTool") {
        const toolResponse = await quizTool(
          toolRequest.toolRequest.input as any,
        );

        toolResponses.push({
          toolResponse: { name: "quizTool", output: toolResponse, ref: "" },
        });
      }
    }

    const response = await generate({
      model: gemini15Pro,
      prompt: toolResponses,
      output: {
        schema: z
          .object({
            questions: z
              .array(
                z.object({
                  question: z
                    .string()
                    .describe("The questions for the given quiz question."),
                  answer: z
                    .array(z.string())
                    .describe("The array of answers to the question."),
                }),
              )
              .describe(
                "An array of objects that contains the question and answers to that question.",
              ),
          })
          .describe(
            "JavaScript Object which contains the questions property that indicates the questions for the user",
          ),
      },
    });

    const res = response.output();

    console.log("toolrequests: ", { res });

    const quizId = input.quizId;
    const output = llmResponse.output();

    if (!output) {
      throw new HttpsError("internal", "Unable to generate questions!");
    }

    const quizQuestionsRef = getFirelord<QuizQuestions>(
      db,
      "quizes",
      "quizQuestions",
    );

    const batch = writeBatch(db);

    for (const question of output.questions) {
      const quizQuestionDoc = quizQuestionsRef.doc(
        quizQuestionsRef.collection(quizId),
      );

      batch.create(quizQuestionDoc, {
        answers: question.answer.map((answer) => ({ answer })),
        question: question.question,
        isChosen: null,
        isRemoved: null,
      });
    }

    await batch.commit();

    return { status: "done" as const };
  },
);

the original source code which throws an error:

import { generate } from "@genkit-ai/ai";
import { firebaseAuth } from "@genkit-ai/firebase/auth";
import { noAuth, onFlow } from "@genkit-ai/firebase/functions";
import { gemini15Pro } from "@genkit-ai/googleai";
import { QuizQuestions } from "@repo/db/src/firestore";
import { generateQuizQuestionsFlowSchema } from "@repo/schemas";
import { getApp } from "firebase-admin/app";
import { HttpsError } from "firebase-functions/v2/https";
import { getFirelord, getFirestore, writeBatch } from "firelord";
import * as z from "zod";
import { quizTool } from "../tools/quiz-tool";

export const generateQuizQuestions = onFlow(
  {
    name: "generateQuizQuestions",
    inputSchema: generateQuizQuestionsFlowSchema.inputSchema,
    outputSchema: generateQuizQuestionsFlowSchema.outputSchema,
    authPolicy: process.env.FUNCTIONS_EMULATOR
      ? noAuth()
      : firebaseAuth((user) => {
          if (!user || !user.isAdmin) {
            throw new HttpsError("unauthenticated", "User must be logged in!");
          }
        }),
  },
  async (input, streamingCallback) => {
    if (streamingCallback) {
      throw new HttpsError(
        "failed-precondition",
        "This flow cannot be invoked using streaming!",
      );
    }

    const llmResponse = await generate({
      model: gemini15Pro,
      prompt: `
        Respond as JSON only.
        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.
        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.
        You should take the following criteria in mind:
          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly
          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions
          * you may also generate more than one potential answers to a given question

        QuizId: ${input.quizId}
      `,
      tools: [quizTool],
      output: {
        schema: z
          .object({
            questions: z
              .array(
                z.object({
                  question: z
                    .string()
                    .describe("The questions for the given quiz question."),
                  answer: z
                    .array(z.string())
                    .describe("The array of answers to the question."),
                }),
              )
              .describe(
                "An array of objects that contains the question and answers to that question.",
              ),
          })
          .describe(
            "JavaScript Object which contains the questions property that indicates the questions for the user",
          ),
      },
    });

    const quizId = input.quizId;
    const output = llmResponse.output();

    if (!output) {
      throw new HttpsError("internal", "Unable to generate questions!");
    }

    const quizQuestionsRef = getFirelord<QuizQuestions>(
      db,
      "quizes",
      "quizQuestions",
    );

    const batch = writeBatch(db);

    for (const question of output.questions) {
      const quizQuestionDoc = quizQuestionsRef.doc(
        quizQuestionsRef.collection(quizId),
      );

      batch.create(quizQuestionDoc, {
        answers: question.answer.map((answer) => ({ answer })),
        question: question.question,
        isChosen: null,
        isRemoved: null,
      });
    }

    await batch.commit();

    return { status: "done" as const };
  },
);

here is the quizTool, which is the same in both cases:

import { defineTool } from "@genkit-ai/ai";
import { Quizes } from "@repo/db/src/firestore";
import { error } from "firebase-functions/logger";
import { HttpsError } from "firebase-functions/v2/https";
import { getDoc, getFirelord, getFirestore } from "firelord";
import { z } from "zod";

const db = getFirestore();

export const quizTool = defineTool(
  {
    name: "quizTool",
    description: `
    This tool is useful for getting information about a quiz. The information includes the following:
      * the number of questions that should be generated
      * the topics that the user has chosen
      * the user questions that should be included in the questions
  `,
    inputSchema: z
      .object({
        quizId: z
          .string()
          .describe(
            "The ID of the quiz about which you require more information.",
          ),
      })
      .describe(
        "JavaScript object which contains the quiz id property, which should be the id of the quiz that you require the information about.",
      ),
    outputSchema: z
      .object({
        numberOfQuestions: z
          .number()
          .describe("The amount of questions that should be generated"),
        userQuestions: z
          .array(z.string())
          .describe(
            "The questions that were chosen by the user and must be included in the generated question set with the provided answers",
          ),
        userTopics: z
          .array(z.string())
          .describe(
            "The topics that were chosen by the user for which you must generate questions and the appropriate answers.",
          ),
      })
      .describe(
        "JavaScript object which contains the numberOfQuestions property, userQuestions and userTopics properties.",
      ),
  },
  async (input) => {
    const quizId = input.quizId;

    const quizesRef = getFirelord<Quizes>(db, "quizes");
    const quizDoc = await getDoc(quizesRef.doc(quizId));

    if (!quizDoc.exists) {
      throw new HttpsError(
        "not-found",
        `Unable to find the given quiz with id: ${quizId}`,
      );
    }

    const docData = quizDoc.data();

    if (!docData) {
      throw new HttpsError("not-found", "unable to find quiz!");
    }

    if (
      !docData.numberOfQuestions ||
      !docData.chosenQuestions ||
      !docData.chosenTopics
    ) {
      error(
        `the quiz (id: ${quizId}) is missing either one of the required parameters!`,
      );

      throw new HttpsError("internal", "Internal Error Occurred");
    }

    return {
      numberOfQuestions: docData.numberOfQuestions,
      userQuestions: docData.chosenQuestions,
      userTopics: docData.chosenTopics,
    };
  },
);

If I remove the output.schema, all works and I get an object as a response, but the object takes an arbitrary shape instead of the desired shape that I'd like, which makes parsing things quite tedious.

here is an example:

import { generate } from "@genkit-ai/ai";
import { firebaseAuth } from "@genkit-ai/firebase/auth";
import { noAuth, onFlow } from "@genkit-ai/firebase/functions";
import { gemini15Pro } from "@genkit-ai/googleai";
import { QuizQuestions } from "@repo/db/src/firestore";
import { generateQuizQuestionsFlowSchema } from "@repo/schemas";
import { getApp } from "firebase-admin/app";
import { HttpsError } from "firebase-functions/v2/https";
import { getFirelord, getFirestore, writeBatch } from "firelord";
// import * as z from "zod";
import { quizTool } from "../tools/quiz-tool";

const app = getApp();
const db = getFirestore();

export const generateQuizQuestions = onFlow(
  {
    name: "generateQuizQuestions",
    inputSchema: generateQuizQuestionsFlowSchema.inputSchema,
    outputSchema: generateQuizQuestionsFlowSchema.outputSchema,
    authPolicy: process.env.FUNCTIONS_EMULATOR
      ? noAuth()
      : firebaseAuth((user) => {
          if (!user || !user.isAdmin) {
            throw new HttpsError("unauthenticated", "User must be logged in!");
          }
        }),
  },
  async (input, streamingCallback) => {
    if (streamingCallback) {
      throw new HttpsError(
        "failed-precondition",
        "This flow cannot be invoked using streaming!",
      );
    }

    const llmResponse = await generate({
      model: gemini15Pro,
      prompt: `
        Respond as JSON only.
        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.
        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.
        You should take the following criteria in mind:
          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly
          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions
          * you may also generate more than one potential answers to a given question

        QuizId: ${input.quizId}
      `,
      tools: [quizTool],
      // output: {
      //   schema: z
      //     .object({
      //       questions: z
      //         .array(
      //           z.object({
      //             question: z
      //               .string()
      //               .describe("The questions for the given quiz question."),
      //             answer: z
      //               .array(z.string())
      //               .describe("The array of answers to the question."),
      //           }),
      //         )
      //         .describe(
      //           "An array of objects that contains the question and answers to that question.",
      //         ),
      //     })
      //     .describe(
      //       "JavaScript Object which contains the questions property that indicates the questions for the user",
      //     ),
      // },
    });

    const quizId = input.quizId;
    const output = llmResponse.output();

    if (!output) {
      throw new HttpsError("internal", "Unable to generate questions!");
    }

    const quizQuestionsRef = getFirelord<QuizQuestions>(
      db,
      "quizes",
      "quizQuestions",
    );

    const batch = writeBatch(db);

    for (const question of output.questions) {
      const quizQuestionDoc = quizQuestionsRef.doc(
        quizQuestionsRef.collection(quizId),
      );

      batch.create(quizQuestionDoc, {
        answers: question.answer.map((answer) => ({ answer })),
        question: question.question,
        isChosen: null,
        isRemoved: null,
      });
    }

    await batch.commit();

    return { status: "done" as const };
  },
);

and the Genkit local dashboard output: input:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "\n        Respond as JSON only.\n        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.\n        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.\n        You should take the following criteria in mind:\n          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly\n          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions\n          * you may also generate more than one potential answers to a given question\n        \n        QuizId: 0qfxuNZiTgBsQ1ZXYpQ5\n      "
        }
      ]
    }
  ],
  "tools": [
    {
      "name": "quizTool",
      "description": "\n    This tool is useful for getting information about a quiz. The information includes the following:\n      * the number of questions that should be generated\n      * the topics that the user has chosen\n      * the user questions that should be included in the questions\n  ",
      "outputSchema": {
        "type": "object",
        "properties": {
          "numberOfQuestions": {
            "type": "number",
            "description": "The amount of questions that should be generated"
          },
          "userQuestions": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "The questions that were chosen by the user and must be included in the generated question set with the provided answers"
          },
          "userTopics": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "The topics that were chosen by the user for which you must generate questions and the appropriate answers."
          }
        },
        "required": [
          "numberOfQuestions",
          "userQuestions",
          "userTopics"
        ],
        "additionalProperties": true,
        "description": "JavaScript object which contains the numberOfQuestions property, userQuestions and userTopics properties.",
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "inputSchema": {
        "type": "object",
        "properties": {
          "quizId": {
            "type": "string",
            "description": "The ID of the quiz about which you require more information."
          }
        },
        "required": [
          "quizId"
        ],
        "additionalProperties": true,
        "description": "JavaScript object which contains the quiz id property, which should be the id of the quiz that you require the information about.",
        "$schema": "http://json-schema.org/draft-07/schema#"
      }
    }
  ],
  "output": {
    "format": "text"
  }
}

output:

{
  "candidates": [
    {
      "index": 0,
      "message": {
        "role": "model",
        "content": [
          {
            "toolRequest": {
              "name": "quizTool",
              "input": {
                "quizId": "0qfxuNZiTgBsQ1ZXYpQ5"
              }
            }
          }
        ]
      },
      "finishReason": "stop",
      "custom": {
        "safetyRatings": [
          {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_HARASSMENT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "probability": "NEGLIGIBLE"
          }
        ]
      }
    }
  ],
  "custom": {
    "candidates": [
      {
        "content": {
          "parts": [
            {
              "functionCall": {
                "name": "quizTool",
                "args": {
                  "quizId": "0qfxuNZiTgBsQ1ZXYpQ5"
                }
              }
            }
          ],
          "role": "model"
        },
        "finishReason": "STOP",
        "index": 0,
        "safetyRatings": [
          {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_HARASSMENT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "probability": "NEGLIGIBLE"
          },
          {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "probability": "NEGLIGIBLE"
          }
        ]
      }
    ],
    "usageMetadata": {
      "promptTokenCount": 287,
      "candidatesTokenCount": 29,
      "totalTokenCount": 316
    }
  },
  "usage": {
    "inputCharacters": 975,
    "inputImages": 0,
    "inputVideos": 0,
    "inputAudioFiles": 0,
    "outputCharacters": 0,
    "outputImages": 0,
    "outputVideos": 0,
    "outputAudioFiles": 0,
    "inputTokens": 287,
    "outputTokens": 29,
    "totalTokens": 316
  },
  "latencyMs": 2156.761416912079
}

here is the input for the one that Errors out:

image
{
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "text": "\n        Respond as JSON only.\n        You are a helpful assistant. Using the tools available, try to generate questions and appropriate answers for a quiz.\n        You should only generate n number of questions, n is the number of questions that should be generated based on the tool's response.\n        You should take the following criteria in mind:\n          * generate questions and answers based on the topics that the user has chosen, you may use the appropriate tools to get this information, you may use any distribution to generate the questions from the provided topics randomly\n          * generate questions and answers based on the questions that the user has chosen, you may use the appropriate tools to get this information, you must include these questions and you must only provide potential asnwers to these questions\n          * you may also generate more than one potential answers to a given question\n        \n        QuizId: 0qfxuNZiTgBsQ1ZXYpQ5\n      "
        },
        {
          "text": "\n\nOutput should be in JSON format and conform to the following schema:\n\n```\n{\"type\":\"object\",\"properties\":{\"questions\":{\"type\":\"array\",\"items\":{\"type\":\"object\",\"properties\":{\"question\":{\"type\":\"string\",\"description\":\"The questions for the given quiz question.\"},\"answer\":{\"type\":\"array\",\"items\":{\"type\":\"string\"},\"description\":\"The array of answers to the question.\"}},\"required\":[\"question\",\"answer\"],\"additionalProperties\":true},\"description\":\"An array of objects that contains the question and answers to that question.\"}},\"required\":[\"questions\"],\"additionalProperties\":true,\"description\":\"JavaScript Object which contains the questions property that indicates the questions for the user\",\"$schema\":\"http://json-schema.org/draft-07/schema#\"}\n```\n",
          "metadata": {
            "purpose": "output",
            "source": "default"
          }
        }
      ]
    }
  ],
  "tools": [
    {
      "name": "quizTool",
      "description": "\n    This tool is useful for getting information about a quiz. The information includes the following:\n      * the number of questions that should be generated\n      * the topics that the user has chosen\n      * the user questions that should be included in the questions\n  ",
      "outputSchema": {
        "type": "object",
        "properties": {
          "numberOfQuestions": {
            "type": "number",
            "description": "The amount of questions that should be generated"
          },
          "userQuestions": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "The questions that were chosen by the user and must be included in the generated question set with the provided answers"
          },
          "userTopics": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": "The topics that were chosen by the user for which you must generate questions and the appropriate answers."
          }
        },
        "required": [
          "numberOfQuestions",
          "userQuestions",
          "userTopics"
        ],
        "additionalProperties": true,
        "description": "JavaScript object which contains the numberOfQuestions property, userQuestions and userTopics properties.",
        "$schema": "http://json-schema.org/draft-07/schema#"
      },
      "inputSchema": {
        "type": "object",
        "properties": {
          "quizId": {
            "type": "string",
            "description": "The ID of the quiz about which you require more information."
          }
        },
        "required": [
          "quizId"
        ],
        "additionalProperties": true,
        "description": "JavaScript object which contains the quiz id property, which should be the id of the quiz that you require the information about.",
        "$schema": "http://json-schema.org/draft-07/schema#"
      }
    }
  ],
  "output": {
    "format": "json",
    "schema": {
      "type": "object",
      "properties": {
        "questions": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "question": {
                "type": "string",
                "description": "The questions for the given quiz question."
              },
              "answer": {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "description": "The array of answers to the question."
              }
            },
            "required": [
              "question",
              "answer"
            ],
            "additionalProperties": true
          },
          "description": "An array of objects that contains the question and answers to that question."
        }
      },
      "required": [
        "questions"
      ],
      "additionalProperties": true,
      "description": "JavaScript Object which contains the questions property that indicates the questions for the user",
      "$schema": "http://json-schema.org/draft-07/schema#"
    }
  }
}

To Reproduce

  1. Use Gemini 1.5 Pro from google ai genkit plugin
  2. specify any tool for the model to call
  3. specify output.schema for the generate call

Expected behavior I would expect to be able to call functions / tools via the model and the output should be JSON / zod parsable. I thought this was entirely possible according to the document / sample code given on the Genkit site + the model description site.

image

where it is stated that the model supports:

Screenshots please see the screenshot above, thank you.

Runtime (please complete the following information):

** Node version

Additional context Should I cross post this into: https://github.com/google-gemini/generative-ai-js as that seems to be the root cause of the issue? 🤔 Is this something that is not currently supported? 🤔 It seems weird that I'm unable to specify a concrete schema for the generate function, but the model itself is capable of returning JSON / object responses, am I missing something? Thank you 🙏

MichaelDoyle commented 1 month ago

We will take a look at this. Thanks for reporting!

Changed the title for readability as the exact error message(s) is/are repeated in the bug reports itself.

lazakrisz commented 1 month ago

We will take a look at this. Thanks for reporting!

Changed the title for readability as the exact error message(s) is/are repeated in the bug reports itself.

Thank you @MichaelDoyle 🙏

cabljac commented 1 month ago

Looking into this, just reproduced it. Seems to happen with gemini 1.5 flash too, so unlikely the model

cabljac commented 1 month ago

An issue where they're getting the same message, linking just for reference https://github.com/google-gemini/generative-ai-swift/issues/195

cabljac commented 1 month ago

I believe this is a limitation of the Gemini API at the moment, I will try to follow up and get it passed on to them

ariel-pettyjohn commented 1 month ago

Looking into this, just reproduced it. Seems to happen with gemini 1.5 flash too, so unlikely the model

Can confirm @cabljac, I just came across this same bug using 1.5 Flash.

ariel-pettyjohn commented 1 month ago

I believe this is a limitation of the Gemini API at the moment, I will try to follow up and get it passed on to them

This is a good example of why I'm so curious about the philosophy behind when to use tools vs. custom services: https://github.com/firebase/genkit/discussions/731. This seems like a use-case where services might be necessary, at least for the time being, if we want to do things like compose model responses, resolve promises, etc., while also validating output?

lazakrisz commented 1 month ago

I believe this is a limitation of the Gemini API at the moment, I will try to follow up and get it passed on to them

Okay, thank you.

lazakrisz commented 1 month ago

I believe this is a limitation of the Gemini API at the moment, I will try to follow up and get it passed on to them

This is a good example of why I'm so curious about the philosophy behind when to use tools vs. custom services: #731. This seems like a use-case where services might be necessary, at least for the time being, if we want to do things like compose model responses, resolve promises, etc., while also validating output?

In my opinion genkit is such a "service", it does parsing, validation, generation, has an interface for tools, etc... I believe the end user shouldn't have to build their own abstraction on top of these building blocks, just like I don't have to build any abstraction layers when I'm using firestore, firebase storage, callable functions or any other firebase service.

ariel-pettyjohn commented 4 weeks ago

That more or less aligns with my understanding of Genkit's goals @lazakrisz, and I agree. I initially developed a "service" abstraction for my own use-case because I was just stubbing things in until I got up to speed using tools. It was only when I went to refactor and replace the service abstraction with Genkit's tools that I hit up against this limitation.

Specifically, I'm identifying the names/types of fields in a form given a form name. I then use the field type to determine the corresponding candidate field attributes, which get used to dynamically generate an output schema. The current inefficiency is that I'm essentially using this service abstraction to fan-out model calls for every field/attribute pair. I'm working on creating one mega-prompt that asks the model to select from the candidate attributes in one go, which would get it down to just two API calls.

With support for output schema though, it seems like I could accomplish all of this in one call using a tool to map the field type to its candidate attributes, which would be even more awesome

Okay, looking at @MichaelDoyle's response in this other discussion, I don't think this is actually true.

In this case, even though I just need a tool to retrieve a static value from a local object or JSON file, I suppose there would still need to be two API calls.

cabljac commented 4 weeks ago

@lazakrisz @ariel-pettyjohn

Are you two in the genkit discord? I feel like this and any other feedback you have would be highly valued there

ariel-pettyjohn commented 4 weeks ago

Just joined @cabljac, thanks for the suggestion!

mbleigh commented 1 week ago

My suggestion for this is to check for the condition of tools + json mode in the Gemini plugins (Google/Vertex) and simply don't set JSON mode when that's the case.

Gemini output was pretty good before we even had JSON mode implemented, it should work in the majority of cases. We can separately suggest it as a fix at the Gemini API layer.