firebase / genkit

An open source framework for building AI-powered apps with familiar code-centric patterns. Genkit makes it easy to integrate, test, and deploy sophisticated AI features to Firebase or Google Cloud.
Apache License 2.0
603 stars 74 forks source link

[JS] 400 error on OAuth (defineFirestoreRetriever retrieve call) #553

Closed cabljac closed 2 months ago

cabljac commented 2 months ago

Describe the bug

EDIT: See comments, this isn't to do with the retriever, it's to do with the embed method and authentication to GCP

I was investigating https://github.com/firebase/genkit/issues/452 and when I attempt to retrieve some docs following the official documentation for firestore retrievers, i get this a 400 error. Here are the logs:

Registering plugin firebase...
Registering plugin vertexai...
Registering plugin dotprompt...
Registering flow state stores...
Registering trace stores...
  - prod: firebase
Registering retriever: retriever-id
Initializing plugin vertexai:
Registering model: vertexai/imagen2
Registering model: vertexai/gemini-1.0-pro
Registering model: vertexai/gemini-1.0-pro-vision
Registering model: vertexai/gemini-1.5-pro
Registering model: vertexai/gemini-1.5-flash
Registering model: vertexai/gemini-1.5-pro-preview
Registering model: vertexai/gemini-1.5-flash-preview
Registering embedder: vertexai/textembedding-gecko@003
Registering embedder: vertexai/textembedding-gecko@002
Registering embedder: vertexai/textembedding-gecko@001
Registering embedder: vertexai/text-embedding-004
Registering embedder: vertexai/textembedding-gecko-multilingual@001
Registering embedder: vertexai/text-multilingual-embedding-002
GaxiosError: invalid_scope: Invalid OAuth scope or ID token audience provided.
    at Gaxios._request (/Users/jacob/test-playground-1/node_modules/gaxios/build/src/gaxios.js:142:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async GoogleToken._GoogleToken_requestToken (/Users/jacob/test-playground-1/node_modules/gtoken/build/src/index.js:241:19)
    at async GoogleToken._GoogleToken_getTokenAsync (/Users/jacob/test-playground-1/node_modules/gtoken/build/src/index.js:160:16)
    at async JWT.refreshTokenNoCache (/Users/jacob/test-playground-1/node_modules/google-auth-library/build/src/auth/jwtclient.js:173:23)
    at async JWT.refreshAccessTokenAsync (/Users/jacob/test-playground-1/node_modules/google-auth-library/build/src/auth/oauth2client.js:247:19)
    at async JWT.getAccessTokenAsync (/Users/jacob/test-playground-1/node_modules/google-auth-library/build/src/auth/oauth2client.js:276:23)
    at async GoogleAuth.getAccessToken (/Users/jacob/test-playground-1/node_modules/google-auth-library/build/src/auth/googleauth.js:718:17) {
  config: {
    method: 'POST',
    url: 'https://www.googleapis.com/oauth2/v4/token',
    data: {
      grant_type: '<<REDACTED> - See `errorRedactor` option in `gaxios` for configuration>.',
      assertion: '<<REDACTED> - See `errorRedactor` option in `gaxios` for configuration>.'
    },
    headers: {
      'Content-Type': 'application/x-www-form-urlencoded',
      'User-Agent': 'google-api-nodejs-client/9.11.0',
      'x-goog-api-client': 'gl-node/20.11.1',
      Accept: 'application/json'
    },
    responseType: 'json',
    retryConfig: {
      httpMethodsToRetry: [Array],
      currentRetryAttempt: 0,
      retry: 3,
      noResponseRetries: 2,
      retryDelayMultiplier: 2,
      timeOfFirstRequest: 1720438949970,
      totalTimeout: 9007199254740991,
      maxRetryDelay: 9007199254740991,
      statusCodesToRetry: [Array]
    },
    paramsSerializer: [Function: paramsSerializer],
    body: '<<REDACTED> - See `errorRedactor` option in `gaxios` for configuration>.',
    validateStatus: [Function: validateStatus],
    errorRedactor: [Function: defaultErrorRedactor]
  },
  response: {
    config: {
      method: 'POST',
      url: 'https://www.googleapis.com/oauth2/v4/token',
      data: [Object],
      headers: [Object],
      responseType: 'json',
      retryConfig: [Object],
      paramsSerializer: [Function: paramsSerializer],
      body: '<<REDACTED> - See `errorRedactor` option in `gaxios` for configuration>.',
      validateStatus: [Function: validateStatus],
      errorRedactor: [Function: defaultErrorRedactor]
    },
    data: {
      error: 'invalid_scope',
      error_description: 'Invalid OAuth scope or ID token audience provided.'
    },
    headers: {
      'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000',
      'cache-control': 'private',
      'content-encoding': 'gzip',
      'content-type': 'application/json; charset=UTF-8',
      date: 'Mon, 08 Jul 2024 11:42:29 GMT',
      server: 'scaffolding on HTTPServer2',
      'transfer-encoding': 'chunked',
      vary: 'Origin, X-Origin, Referer',
      'x-content-type-options': 'nosniff',
      'x-frame-options': 'SAMEORIGIN',
      'x-xss-protection': '0'
    },
    status: 400,
    statusText: 'Bad Request',
    request: { responseURL: 'https://www.googleapis.com/oauth2/v4/token' }
  },
  error: undefined,
  status: 400,
  [Symbol(gaxios-gaxios-error)]: '6.7.0'
}

To Reproduce I ran genkit init to bootstrap up a project. I created a fairly minimal example here:

import { defineFirestoreRetriever } from "@genkit-ai/firebase";
import { textEmbeddingGecko } from "@genkit-ai/vertexai";
import { retrieve } from "@genkit-ai/ai/retriever";
import { initializeApp } from "firebase-admin/app";
import { getFirestore } from "firebase-admin/firestore";

import { firebase } from "@genkit-ai/firebase";
import { vertexAI } from "@genkit-ai/vertexai";
import { configureGenkit } from "@genkit-ai/core";

if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
  throw new Error(
    "GOOGLE_APPLICATION_CREDENTIALS environment variable is not set."
  );
}

initializeApp();

configureGenkit({
  plugins: [firebase(), vertexAI({ location: "us-central1" })],
  logLevel: "debug",
  traceStore: "firebase",
});

const main = async () => {
  const myRetrieverRef = defineFirestoreRetriever({
    name: "retriever-id",
    firestore: getFirestore(),
    collection: "myCollection",
    contentField: "myChunks",
    vectorField: "embedding",
    embedder: textEmbeddingGecko,
    distanceMeasure: "COSINE", // 'EUCLIDEAN', 'DOT_PRODUCT', or 'COSINE' (default)
  });

  const docs = await retrieve({
    retriever: myRetrieverRef,
    query: "Hello World",
    options: {
      limit: 5,
      k: 3,
    },
  });

  console.log(JSON.stringify(docs, null, 2));
};

main().catch(console.error);

Expected behavior I would expect docs to be retrieved, and not to hit some authentication error. If I manually try to retrieve docs with firebase admin, i do not hit this error.

Runtime (please complete the following information):

Node version

cabljac commented 2 months ago

I believe this is actually coming from the embed call

cabljac commented 2 months ago

Updated repro:

import { textEmbeddingGecko } from "@genkit-ai/vertexai";
import { Document } from "@genkit-ai/ai/retriever";

import { firebase } from "@genkit-ai/firebase";
import { vertexAI } from "@genkit-ai/vertexai";
import { configureGenkit } from "@genkit-ai/core";
import { defineFlow } from "@genkit-ai/flow";
import { embed } from "@genkit-ai/ai/embedder";

if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
  throw new Error(
    "GOOGLE_APPLICATION_CREDENTIALS environment variable is not set."
  );
}

configureGenkit({
  plugins: [
    firebase(),
    vertexAI({
      location: "us-central1", // fails whether this is set or not
      projectId: "<REDACTED>", // fails whether this is set or not
      // googleAuth: { keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS }, // fails whether this is set or not
    }),
  ],
  logLevel: "debug",
  traceStore: "firebase",
});

export const myFlow = defineFlow(
  {
    name: "myFlow",
  },
  async () => {
    const embedding = await embed({
      embedder: textEmbeddingGecko,
      content: Document.fromText("Hello, world!"),
    });

    console.info(embedding);
  }
);
Screenshot 2024-07-08 at 13 33 52
cabljac commented 2 months ago

This works:

import { firebase } from "@genkit-ai/firebase";
import { vertexAI } from "@genkit-ai/vertexai";
import { configureGenkit } from "@genkit-ai/core";

if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
  throw new Error(
    "GOOGLE_APPLICATION_CREDENTIALS environment variable is not set."
  );
}

configureGenkit({
  plugins: [firebase(), vertexAI({
    googleAuth: {
      scopes: ["https://www.googleapis.com/auth/cloud-platform"],
    },
    location: "us-central1",
  })],
  logLevel: "debug",
  traceStore: "firebase",
});

Do we need to add scopes manually here?

chrisraygill commented 2 months ago

@ssbushi can you take a look when you get a chance?

ssbushi commented 2 months ago

Thanks @cabljac

I ran into the exact same issue when testing out ChromaDB deployed. It has nothing to do with Firestore (or ChromaDB) but the missing scope for Vertex AI. This only happens when GOOGLE_APPLICATION_CREDENTIALS is used for app credentials and does not occur if using gcloud auth instead.

You seem to have found the workaround -- thanks for that. We should definitely not require explicit scopes set here, I can update the plugin to reflect that.