BAAI-Agents / Cradle

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
https://baai-agents.github.io/Cradle/
MIT License
1.56k stars 141 forks source link

About text embedding from text tokens. #48

Closed ZhaojunCP closed 1 month ago

ZhaojunCP commented 1 month ago

Thank you for your work! May I ask if you input the tokens of the text into text-embedding-ada-002? As far as I know, text-embedding-ada-002 requires a string rather than a list of integer like tokens. Could I get your explanation? Thank you.

image
DVampire commented 1 month ago

Our OpenAI LLM provider primarily refers to LangChain's implementation. The specific reference code and reasons are as follows:

  1. The LangChain code is https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/embeddings/openai.py. At line 397, LangChain states that it primarily refers to OpenAI's codebook.
  2. The openai codebook link is https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb. They wrote it this way primarily to address embedding texts that are longer than the model's maximum context length.

Please refer to the above code. If you have any questions, feel free to contact us. Thanks.