Closed kauabh closed 3 months ago
Hi can anyone trigger the workflow & review
@drazvan thanks for reviewing & comments, really appreciate that. Regarding commit sign, seems like something off at my end, note sure what, some commits are getting signed, on rebase also getting some errors. Will try again or if push come to shove, will close this PR (& other one too) & try to create a fresh PR from proper branch having signed commits.
@drazvan please ignore review request, erroneously triggered
Hi @drazvan is it okay if I close thie PR & this https://github.com/NVIDIA/NeMo-Guardrails/pull/666 to create fresh ones, I am able to resolve most of the comments but having issues with "signing commits". I will create fresh PR with properly signed commits.
@kaushikabhishek87 : yes, feel free to create a new one and link these in the comment for reference.
Hi @kaushikabhishek87,
I noticed that you are currently using base_url
as a parameter in the __init__
method. While this works for now, I suggest using **kwargs
to allow for greater flexibility and future proofing.
By using **kwargs
, we can easily accommodate additional parameters that might be needed for different models without having to modify the method signature each time. This makes the code more maintainable and extensible.
Here's an example of how you can implement this:
class BasicEmbeddingsIndex(EmbeddingsIndex):
def __init__(
self,
embedding_model: str,
embedding_engine: str,
search_threshold: float = None,
cache_config: Union[Dict, EmbeddingsCacheConfig] = None,
use_batching: bool = False,
max_batch_size: int = 10,
max_batch_hold: float = 0.01,
**kwargs,
):
"""Initialize the BasicEmbeddingsIndex."""
self.embedding_model = embedding_model
self.embedding_engine = embedding_engine
self._embedding_size = 0
self.search_threshold = search_threshold or float("inf")
self._parameters = kwargs
if isinstance(cache_config, Dict):
self._cache_config = EmbeddingsCacheConfig(**cache_config)
else:
self._cache_config = cache_config
def _init_model(self):
"""Initialize the model used for computing the embeddings."""
self._model = init_embedding_model(
embedding_model=self.embedding_model,
embedding_engine=self.embedding_engine,
**self._parameters,
)
Additionally, we should update the __init__
method of all EmbeddingModel
concrete classes to accept **kwargs
. Here's an example for the OpenAIEmbeddingModel
class:
class OpenAIEmbeddingModel(EmbeddingModel):
"""Embedding model using OpenAI API.
Args:
embedding_model (str): The name of the embedding model.
Attributes:
model (str): The name of the embedding model.
embedding_size (int): The size of the embeddings.
Methods:
encode: Encode a list of documents into embeddings.
"""
engine_name = "openai"
def __init__(
self,
embedding_model: str,
**kwargs,
):
# do nothing with kwargs
This way, any additional parameters can be passed through kwargs
and handled appropriately.
Let me know what you think! Please let me know if I'm overlooking or missing something.
Also if it is something that will take much of your time, we can do it later in a separate PR. Thanks!
@drazvan what do you think?
@Pouyanpi thanks for the review & comments, make sense to me. I am closing this PR as per above discussion in comments & will create a fresh one. I will include suggested changes as this conversation unfolds.
Hi, thank you for all the work around guardrails, I am using NeMo in my project & it is really awesome. This is my first contribution to the repo, happy to learn. I am using Ollama as LLM with NeMo and also wanted to use Ollama Embeddings for embedding part.
This PR facilitates using Ollama Embeddings as a choice within embeddings models. With this PR users can use below structure with Ollama