chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.3k stars 1.2k forks source link

Issues running `npm`:`chromadb` inside `nextjs` #953

Closed jeffchuber closed 12 months ago

jeffchuber commented 1 year ago

What's wrong

Chroma's npm package chromadb is currently incompatible with nextjs builds.

Here is a repo you can very minimal clone and run - https://github.com/jeffchuber/nextjs-chroma

git clone https://github.com/jeffchuber/nextjs-chroma
cd nextjs-chroma
yarn
yarn dev
# visit localhost:3000

Relevant logs

Inclusion breaks nextjs.

~/src/nextjs-chroma main !2 ?1 > yarn dev                                                                           8s 11:23:30 PM
yarn run v1.22.19
$ next dev
- ready started server on 0.0.0.0:3000, url: http://localhost:3000
- event compiled client and server successfully in 219 ms (20 modules)
- wait compiling...
- event compiled client and server successfully in 74 ms (20 modules)
- wait compiling /page (client and server)...
- error ./node_modules/chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js:131:0
Module not found: Can't resolve '@visheratin/web-ai'

https://nextjs.org/docs/messages/module-not-found

Import trace for requested module:
./node_modules/chromadb/dist/module/index.js
./src/app/page.js
- wait compiling /_error (client and server)...
- error ./node_modules/chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js:131:0
Module not found: Can't resolve '@visheratin/web-ai'

https://nextjs.org/docs/messages/module-not-found

Import trace for requested module:
./node_modules/chromadb/dist/module/index.js
./src/app/page.js

If you remove this file, we also logspam a bunch of crap into nextjs

Module not found: Can't resolve 'cohere-ai' in '/Users/jeff/src/nextjs-chroma/node_modules/chromadb/dist/module/embeddings'

Import trace for requested module:
./node_modules/chromadb/dist/module/embeddings/CohereEmbeddingFunction.js
./node_modules/chromadb/dist/module/index.js
./src/app/page.js

./node_modules/chromadb/dist/module/embeddings/OpenAIEmbeddingFunction.js
Module not found: Can't resolve 'openai' in '/Users/jeff/src/nextjs-chroma/node_modules/chromadb/dist/module/embeddings'

Import trace for requested module:
./node_modules/chromadb/dist/module/embeddings/OpenAIEmbeddingFunction.js
./node_modules/chromadb/dist/module/index.js
./src/app/page.js

./node_modules/chromadb/dist/main/embeddings/CohereEmbeddingFunction.js
Module not found: Can't resolve 'cohere-ai' in '/Users/jeff/src/nextjs-chroma/node_modules/chromadb/dist/main/embeddings'

Import trace for requested module:
./node_modules/chromadb/dist/main/embeddings/CohereEmbeddingFunction.js
./node_modules/chromadb/dist/main/index.js
./src/app/page.js

./node_modules/chromadb/dist/main/embeddings/OpenAIEmbeddingFunction.js
Module not found: Can't resolve 'openai' in '/Users/jeff/src/nextjs-chroma/node_modules/chromadb/dist/main/embeddings'

Import trace for requested module:
./node_modules/chromadb/dist/main/embeddings/OpenAIEmbeddingFunction.js
./node_modules/chromadb/dist/main/index.js
./src/app/page.js

./node_modules/chromadb/node_modules/node-fetch/lib/index.js
Module not found: Can't resolve 'encoding' in '/Users/jeff/src/nextjs-chroma/node_modules/chromadb/node_modules/node-fetch/lib'

Import trace for requested module:
./node_modules/chromadb/node_modules/node-fetch/lib/index.js
./node_modules/chromadb/node_modules/isomorphic-fetch/fetch-npm-node.js
./node_modules/chromadb/dist/main/generated/runtime.js
./node_modules/chromadb/dist/main/generated/index.js
./node_modules/chromadb/dist/main/ChromaClient.js
./node_modules/chromadb/dist/main/index.js
./src/app/page.js
perzeuss commented 1 year ago

@jeffchuber chromadb is not installed in the example repo you provided. Did you forget to push something?

jeffchuber commented 1 year ago

@perzeuss i did indeed! pushed

perzeuss commented 1 year ago

The error you're encountering is stemming from Merge Request #929.

This merge request introduced a feature designed to operate in a browser environment and to catch a failed dynamic import.

Even if chromadb was built for Browser environments, the dynamic import would still encounter issues if the dependency is not added. This is because the web framework's bundler attempts to locate that module, leading to a failure.

The current design of chromadb does not really support usage within a browser environment. If you're using Next.js, a recommended workaround would be to create a server-side API that interacts with chromadb and have your front-end code make requests to that API.

@jeffchuber to have chromadb running in next.js you could define nextjs-chroma/pages/api/hello.js with the following content:

const { ChromaClient } = require('chromadb');

export default function handler(req, res) {
  const client = new ChromaClient();
  const heartbeatFn = async () => {
    return await client.heartbeat();
  }
  let heartbeat = heartbeatFn();
  res.status(200).json({ heartbeat })
}

This will work when the dynamic import (await import("@visheratin/web-ai")) has been removed from the code so that the next.js bundler will not try to locate the module.

perzeuss commented 1 year ago

I noticed that we're already employing a workaround for dynamic imports in the TransformersEmbeddingFunction.ts file. This workaround can be seen in the comment on this TypeScript issue.

We could apply a similar solution to the WebAIEmbeddingFunction.ts file. However, please note that our JavaScript package is currently not fully optimized for browser environments.

To implement this workaround, we need to modify the asynchronous import const webAI = await import("@visheratin/web-ai");. It should be wrapped as follows: const webAI = await Function('return import("@visheratin/web-ai")')();. This modification will enable the package to function more effectively in a browser environment.

BjoernRave commented 1 year ago

I am trying to run this file in nextjs:

/// app/api/chat/route.ts

import { Message, StreamingTextResponse } from 'ai'
import { ConversationalRetrievalQAChain } from 'langchain/chains'
import { ChatOpenAI } from 'langchain/chat_models/openai'
import { OpenAIEmbeddings } from 'langchain/embeddings/openai'
import {
  AIMessage,
  FunctionMessage,
  HumanMessage,
  SystemMessage,
} from 'langchain/schema'
import { Chroma } from 'langchain/vectorstores/chroma'

export const initVectorDB = async (collection: string) => {
  const vectorStore = await Chroma.fromExistingCollection(
    new OpenAIEmbeddings(),
    { collectionName: collection }
  )

  return vectorStore
}

export const convertMessagesToLangChain = (messages: Message[]) => {
  const allMessages = [...messages]
  const lastMessage = allMessages.splice(-1, 1)

  const newMessages = []

  for (const message of allMessages) {
    if (message.role === 'user') {
      newMessages.push(new HumanMessage(message.content))
    } else if (message.role === 'assistant') {
      newMessages.push(new AIMessage(message.content))
    } else if (message.role === 'function') {
      newMessages.push(new SystemMessage(message.content))
    } else {
      newMessages.push(new FunctionMessage(message.content, message.name))
    }
  }

  return {
    langChainMessages: newMessages,
    question: lastMessage[0].content,
  }
}

export const runtime = 'nodejs'

export async function POST(req: Request) {
  const json = await req.json()
  const messages: Message[] = json.messages

  const vectorStore = await initVectorDB('zustand')

  console.log(messages)
  const model = new ChatOpenAI({
    openAIApiKey: process.env.OPENAI_API_KEY as string,
    modelName: 'gpt-3.5-turbo-16k-0613',
  })

  const chain = ConversationalRetrievalQAChain.fromLLM(
    model,
    vectorStore.asRetriever(),
    {}
  )
  const { langChainMessages, question } = convertMessagesToLangChain(messages)
  console.log(langChainMessages, question, 'question')
  const stream = await chain.stream({
    chat_history: langChainMessages,
    question,
  })

  return new StreamingTextResponse(stream)
}

And I keep getting: Module not found: Can't resolve '@visheratin/web-ai'

which suggests that its thinking chroma is running in the browser, however AFAIK this is only running on the server (am new to the app directory)

perzeuss commented 1 year ago

@BjoernRave the issue is that the build tools try to locate the module regardless of the fact that the import only takes effect in browser environments. Without the fix in #956 you will get this error in all environments because it's thrown on build time. There is a script in the bundle/build process of nextjs which simply does the following check: "there is a module import in the code, I need to check that this module is installed"

The chromadb package does not provide an own client for node and browser environments. That means the code for the browser environment is also loaded when you use chromadb on nextjs server routes / node environment. The scripts in the chromadb package just detect that you are runing in a node environment and only execute code for the node environment. Same for browser environments.

perzeuss commented 1 year ago

@jeffchuber I just noticed that there is actually an ESM build, but it is not generated. I overlooked that because I just wanted to quickly fix the error here 😅 The yarn build script tries to run build:main and build:module via build:*. However, the build is currently failing on my machine due to a typescript error, which is why build:main aborts execution and build:module is never executed.

This error occurs only in my setup because I have installed chroma as git submodule and yarn has issues with a node_module in the parent and in the js client folder.

BjoernRave commented 1 year ago

when I tried to run it in pure node with tsx for example it didnt require that dep. I tried to install that dep and then I get:

./node_modules/chromadb/dist/main/embeddings/WebAIEmbeddingFunction.js:157:74
Module not found: Package path . is not exported from package /Users/bjoernrave/projects/talk-to-docs/node_modules/@visheratin/web-ai (see exports field in /Users/bjoernrave/projects/talk-to-docs/node_modules/@visheratin/web-ai/package.json)

So is this sth that needs to be fixed on the chroma side? Or langchain, or nextjs? (So many parts :D)

perzeuss commented 1 year ago

when I tried to run it in pure node with tsx for example it didnt require that dep. I tried to install that dep and then I get:

./node_modules/chromadb/dist/main/embeddings/WebAIEmbeddingFunction.js:157:74
Module not found: Package path . is not exported from package /Users/bjoernrave/projects/talk-to-docs/node_modules/@visheratin/web-ai (see exports field in /Users/bjoernrave/projects/talk-to-docs/node_modules/@visheratin/web-ai/package.json)

So is this sth that needs to be fixed on the chroma side? Or langchain, or nextjs? (So many parts :D)

You are using the CJS build, At the moment you have to import chroma from chromadb/dist/module like import { ChromaClient } from 'chromadb/dist/module'; when you try to use it in a browser environment. I think when using langchain it would be import { Chroma } from 'langchain/vectorstores/chroma/dist/module'

And then to make it work you'd have to install @visheratin/web-ai and its peer dependencies onnxruntime-web @visheratin/tokenizers jimp When you run yarn add @visheratin/web-ai onnxruntime-web @visheratin/tokenizers jimp and import the ESM build from chromadb/dist/module it should work.

BjoernRave commented 1 year ago

Now I get this error:

Module not found: Package path ./dist/vectorstores/chroma is not exported from package /Users/bjoernrave/projects/talk-to-docs/node_modules/langchain (see exports field in /Users/bjoernrave/projects/talk-to-docs/node_modules/langchain/package.json)

There is a PR which should fix this issue though, right? So I guess I will have to wait for that to land

chekun commented 1 year ago

Any updates?

helxsz commented 11 months ago

after installing npm i @visheratin/web-ai onnxruntime-web @visheratin/tokenizers jimp. I still got more errors.

- warn ./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/CohereEmbeddingFunction.js
Module not found: Can't resolve 'cohere-ai'

./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/OpenAIEmbeddingFunction.js
Module not found: Can't resolve 'openai'

./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js
Module not found: Can't resolve '@visheratin/web-ai-node' 

./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js
Module not found: Can't resolve '@visheratin/web-ai-node/text'

./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js
Module not found: Can't resolve '@visheratin/web-ai-node/image'

./node_modules/_chromadb@1.5.6@chromadb/dist/module/embeddings/WebAIEmbeddingFunction.js
Module not found: Can't resolve '@visheratin/web-ai-node/multimodal'

<w> [webpack.cache.PackFileCacheStrategy] Caching failed for pack: Error: ENOENT: no such file or directory, rename '/Users/my_projects/my-app3/.next/cache/webpack/client-development-fallback/3.pack.gz_' -> 
perzeuss commented 11 months ago

@helxsz the warnings can be ignored.

Edit 1: Are you sure the webpack errors are caused by installing chromadb?

Edit 2: And do you still have the issue when you uninstall webai related dependencies and using chromadb without an embedding function?

perzeuss commented 11 months ago

I just realize the webai embedding plugin is initialized in node mode. You need to provide a flag when you initialize the webai embedding function to actually run the esm code and not the code for nodejs.

Edit 1: new WebAIEmbeddingFunction('text', false /* <- this should prevent loading the nodejs dependencies*/) should work. @helxsz Did you provide a truthy value for the second parameter? Because it should try to load the web version by default but in your case it is trying to load the node version.