Not hardcoded embedding models

NVIDIA / NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.

Other

4.18k stars 397 forks source link

Not hardcoded embedding models #99

Closed jamescalam closed 1 year ago

jamescalam commented 1 year ago

Hi, I'd like to propose a fix to allow us to set the embedding model used to create the vector representations of rails. My reason for this is primarily that I'd like to be able to use a service like OpenAI for ada-002 primarily as an easy fix to large container sizes when deploying anything containing guardrails code — and additionally to allow us to select different hugging face models if preferred.

I'd propose in an initial version of this to support OpenAI and Hugging Face models. But very open to suggestions — I'd also be happy to work on this. Would there be any preference on how to set an embedding model? I figure we could do something in the config.yaml like:

models:
  - type: main
    engine: openai
    model: text-davinci-003
 - type: embedding
   engine: openai
   model: text-embedding-ada-002

OR with hugging face:

models:
  - type: main
    engine: openai
    model: text-davinci-003
 - type: embedding
   engine: huggingface
   model: sentence-transformers/all-MiniLM-L6-v2

drazvan commented 1 year ago

Hi @jamescalam! Your suggestion works.

There is actually a minimal support for changing the embedding model, but still to a SentenceTransformer one. https://github.com/NVIDIA/NeMo-Guardrails/blob/main/nemoguardrails/actions/llm/generation.py#L73

See https://github.com/NVIDIA/NeMo-Guardrails/discussions/97 as well.

models:
   ...
   - type: embedding
     engine: SentenceTransformer
     model: all-MiniLM-L6-v2

We have a work in progress on a private branch to enable the integration of any embedding and search provider. There are a few things that are changing, including the interface for EmbeddingsIndex which needs to be async. I reckon next week we should be able to push this to the repo. And if you can, you can help with adding support for OpenAI as an EmbeddingSearchProvider.

Thanks a lot!

jamescalam commented 1 year ago

@drazvan created PR here #101

dom-vaz commented 1 year ago

Hi there

I think this approach will work in a lot of cases. But if you have a more complicated enterprise environment it may not be the best solution.

We can already pass a langchain LLM as main Model to LLMRails. Why not add a second variable to pass a langchain embedding model? This would open up a lot of new options without having to implement all of them. OpenAI Support is already good. But in an enterprise setup it will anyways most likely be AzureOpenAI...

Kind Regards Dominik

jamescalam commented 1 year ago

Issue was solved with #101

krannnn commented 12 months ago

Issue was solved with #101

@jamescalam thanks for the contribution. Can you please elaborate or provide the configuration or code snippet where this fix supports Azure OpenAI for Embeddings?

Cheers!

NirnayK commented 5 months ago

so has the support for custom embedding model been added ? I have been search the repo for an example but I couldn't find anything

drazvan commented 5 months ago

It will be added by #548 and be part of the 0.10.0 release.

NirnayK commented 5 months ago

It will be added by #548 and be part of the 0.10.0 release.

Let me change my question. Lets say i have my own qdrant db which has embeddings and its corresponding chunk, where the vector embeddings have been generated using e5-mistral-7b-instruct (4096 dim). How can I use this with guardrails ? Is it possible to configure a custom Embedding Search Provider where when the async search function is called it calls my embedding model and uses the returned embedding to search through my qdrant db and then return List of IndexItem