cloudflare / workers-sdk

⛅️ Home to Wrangler, the CLI for Cloudflare Workers®
https://developers.cloudflare.com/workers/
Apache License 2.0
2.63k stars 691 forks source link

🚀 Feature Request: Provide a Vectorize wrapper to use it from a worker locally #6797

Open frandiox opened 3 weeks ago

frandiox commented 3 weeks ago

Describe the solution

I would like to call Vectorize from local code using a binding. This doesn't seem to be possible right now and I guess we need to deploy the app to use it.

To my understanding, a wrapper around the Vectorize REST API would be enough, especially considering a client is already included in Wrangler. It could be done like the wrapper in Workers AI, which is later added as wrappedBinding + worker in Miniflare options.

Would this be possible or is there any other limitation? 🤔

frandiox commented 3 weeks ago

I've tried to create the mentioned wrapper and it seems to work. Adding here the code in case anyone needs it:

PNPM patch to expose the existing Vectorize client:

diff --git a/wrangler-dist/cli.js b/wrangler-dist/cli.js
index 4bee5475a92f75fb7bb41c43566ef8289cfd4448..9a13a8e2a5970ae37e92156e33a155de97b732af 100644
--- a/wrangler-dist/cli.js
+++ b/wrangler-dist/cli.js
@@ -210778,6 +210778,12 @@ async function deleteMetadataIndex(config, indexName, payload) {
   );
 }
 __name(deleteMetadataIndex, "deleteMetadataIndex");
+module.exports.__vectorize = {
+  insertIntoIndex,
+  upsertIntoIndex,
+  queryIndex,
+  getByIds,
+}

Vectorize wrapped + service bindings to add to Miniflare:

import type { Vectorize } from "@cloudflare/workers-types/experimental";

async function createVectorizeBindings(
  indexName: string,
  config: { account_id?: string } = {}
) {
  // Import the functions that are exported after applying the previous patch
  const {
    // @ts-ignore
    __vectorize: { insertIntoIndex, queryIndex },
  } = await import("wrangler");

  const vectorizeClient = {
    insert(vectors) {
      const formData = new FormData();
      formData.append(
        "vectors",
        new File(
          [vectors.map((v) => JSON.stringify(v)).join("\n")],
          "vectors.ndjson",
          { type: "application/x-ndjson" }
        )
      );

      return insertIntoIndex(config, indexName, formData);
    },
    query(vector, options) {
      return queryIndex(config, indexName, vector, options);
    },
   // ... implement other methods
  } satisfies Partial<Vectorize>;

  const VECTORIZE_BINDING_NAME = "__VECTORIZE_WORKER";

  return {
    wrappedBindings: { scriptName: VECTORIZE_BINDING_NAME },
    serviceWorker: {
      name: VECTORIZE_BINDING_NAME,
      serviceBindings: {
        async VECTORIZE_RUN(request: Request) {
          const { prop, args } = await request.json<{
            prop: keyof typeof vectorizeClient;
            args: Array<any>;
          }>();

          if (!vectorizeClient[prop]) {
            return new Response(`Method "${prop}" not found`, { status: 500 });
          }

          // @ts-ignore
          const result = await vectorizeClient[prop]
            .apply(null, args)
            .catch((error: Error) => error);

          return !result || result instanceof Error
            ? new Response(
                `Failed to fetch Vectorize API: ${
                  result?.message ?? "Unknown error"
                }`,
                { status: 500 }
              )
            : new Response(JSON.stringify(result), { status: 200 });
        },
      },
      modules: true,
      script: `
        class Vectorize {
          constructor(env) {
            return new Proxy(this, {
              get(target, prop) {
                return async (...args) => {
                  const response = await env.VECTORIZE_RUN.fetch("http://example.com/", {
                    method: "POST", body: JSON.stringify({prop, args})
                  });

                  if (response.ok) {
                    return response.json();
                  } else {
                    throw new Error(await response.text().catch(() => "Failed to fetch Vectorize API"));
                  }
                };
              }
            });
          }
        }

        export default function (env) {
          return new Vectorize(env);
        }
    `,
    },
  };
}

Pass it to Miniflare options:

const vectorizeBindings = await createVectorizeBindings('index-name')

new Miniflare({
   // ...
  workers: [{
      name: 'main-worker'
      script: '...',
      // ...
      wrappedBindings: {
        VECTORIZE: vectorizeBindings.wrappedBindings
      },
    },
    vectorizeBindings.serviceWorker,
  ]
})