Open TM02 opened 2 months ago
Hey there, @TM02! 👋 I'm here to help you with any bugs, questions, or becoming a contributor. I'll assist you while we wait for a human maintainer to join.
To configure Ragas to handle larger context windows with Ollama's LLMs, you need to adjust the max_tokens
parameter when initializing your custom LLM. Here's how you can do it:
max_tokens
parameter in your custom LLM initialization to handle larger context windows:from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
llm = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000) # Adjust max_tokens as needed
emb = OllamaEmbeddings(model="nomic-embed-text")
init_ragas_metrics(
metrics,
llm=LangchainLLMWrapper(llm),
embedding=LangchainEmbeddingsWrapper(emb),
)
LangchainLLMWrapper
and LangchainEmbeddingsWrapper
are correctly set up to use these custom models:from ragas.metrics import answer_relevancy
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
metrics = [answer_relevancy]
def init_ragas_metrics(metrics, llm, embedding):
for metric in metrics:
if isinstance(metric, MetricWithLLM):
metric.llm = llm
if isinstance(metric, MetricWithEmbeddings):
metric.embeddings = embedding
run_config = RunConfig()
metric.init(run_config)
llm = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000) # Adjust max_tokens as needed
emb = OllamaEmbeddings(model="nomic-embed-text")
init_ragas_metrics(
metrics,
llm=LangchainLLMWrapper(llm),
embedding=LangchainEmbeddingsWrapper(emb),
)
By setting the max_tokens
parameter to a higher value, you can ensure that Ragas handles larger context windows similar to 'gpt-3.5-16k' [1].
To continue talking to Dosu, mention @dosu.
@dosu
I tried your solution, but it still doesn't work. Perhaps it's due to the model's context window limitation or because Ragas only allows 'gpt-3.5-16k' to handle larger document sizes without reduction. When I exceed Ragas' context window limit, it keeps showing 'nan'.
To handle larger document sizes with Ollama's LLMs and resolve the 'nan' issue when exceeding the context window limit, you can use the LangchainLLMWrapper
class with a properly configured RunConfig
. Here's how you can do it:
LangchainLLMWrapper
with a larger context window:from ragas.run_config import RunConfig
from langchain_community.chat_models import ChatOllama
from ragas.llms.base import LangchainLLMWrapper
# Create a RunConfig with a larger context window
run_config = RunConfig(timeout=60) # Adjust timeout as needed
# Initialize the language model with the desired model name and context size
ollama_model = ChatOllama(model="deepseek-coder-v2:latest", max_tokens=16000) # Adjust max_tokens as needed
# Wrap the language model with LangchainLLMWrapper
llm_wrapper = LangchainLLMWrapper(ollama_model, run_config)
# Now you can use llm_wrapper to generate text with the larger context window
from ragas.metrics import answer_relevancy
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_community.embeddings import OllamaEmbeddings
metrics = [answer_relevancy]
def init_ragas_metrics(metrics, llm, embedding):
for metric in metrics:
if isinstance(metric, MetricWithLLM):
metric.llm = llm
if isinstance(metric, MetricWithEmbeddings):
metric.embeddings = embedding
run_config = RunConfig()
metric.init(run_config)
emb = OllamaEmbeddings(model="nomic-embed-text")
init_ragas_metrics(
metrics,
llm=llm_wrapper,
embedding=LangchainEmbeddingsWrapper(emb),
)
The RagasoutputParser
class retries parsing if an OutputParserException
is encountered. Ensure this mechanism is in place:
class RagasoutputParser(PydanticOutputParser):
async def aparse(self, result: str, prompt: PromptValue, llm: BaseRagasLLM, max_retries: int = 1):
try:
output = super().parse(result)
except OutputParserException:
if max_retries != 0:
p_value = FIX_OUTPUT_FORMAT.format(prompt=prompt.to_string(), completion=result)
output = await llm.generate(p_value)
result = output.generations[0][0].text
return await self.aparse(result, prompt, llm, max_retries - 1)
else:
logger.warning("Failed to parse output. Returning None.")
return None
return output
By configuring the LangchainLLMWrapper
with a larger context window and ensuring proper error handling, you should be able to handle larger document sizes and mitigate the 'nan' issue [1][2].
To continue talking to Dosu, mention @dosu.
It still appeared the same error
@TM02 this could be an issue with the models context length but it is recommended to use a better model for eval so that you can trust the model graded results
[/ ] I checked the documentation and related resources and couldn't find an answer to my question.
I integrated Langfuse with Ragas for evaluation purposes. Referring to issue #53, it's mentioned that by default, "we are using 'gpt-3.5-16k', so you don't have to reduce document size when running the evals." However, when using different Ollama LLMs like llama3, mistral, phi3..., I noticed that Ragas limits the context window to between 500-700 tokens. I want it to handle larger document sizes, similar to 'gpt-3.5-16k'. Is this possible?