Auto-Playground / ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
https://ragrank.readthedocs.io/
Apache License 2.0
24 stars 8 forks source link

LLM Wrapper Error: ValueError: OPENAI_API_KEY not found in the environmen(t). #46

Open antoninoLorenzo opened 4 weeks ago

antoninoLorenzo commented 4 weeks ago

Code I tried to use evaluate with a LangchainLLMWrapper, however for some it still requires an OpenAI key, here is the code:

from ragrank import evaluate
from ragrank.evaluation import EvalResult
from ragrank.integrations.langchain import LangchainLLMWrapper
from ragrank.dataset import from_dataframe
from ragrank.metric import (
    context_relevancy,
    response_relevancy
)
from langchain_community.chat_models import ChatOllama

rr_dataset = from_dataframe(df)

ollama_llm = ChatOllama(model='gemma:2b')
ragrank_llm = LangchainLLMWrapper(llm=ollama_llm)

result: EvalResult = evaluate(
    dataset=rr_dataset,
    llm=ragrank_llm,
    metrics=[
        response_relevancy,
        context_relevancy,
    ],
)

PS: I do not usally work with Langchain so there is a chance I done something wrong with it, however by the error raised, even if this is the case, it is unclear if that's a bug in the library

Problem By further inspecting the problem I found that it is happening here:

File D:\...\.venv\Lib\site-packages\ragrank\evaluation\base.py:74, in <listcomp>(.0)
     70     metrics = [metrics]
     72 dt = time()
     73 scores = [
---> 74     [
     75         metric.score(datanode).score
     76         for datanode in dataset.with_progress("Evaluating")
     77     ]
     78     for metric in metrics
     79 ]

     ...

     File D:\...\.venv\Lib\site-packages\ragrank\metric\_response_related\relevancy.py:74, in ResponseRelevancy.score(self, data)
     72 prompt_str = self.prompt.to_string()
     73 prompt_dt = prompt_str.format(**data.model_dump())
---> 74 response = self.llm.generate_text(
     75     prompt_dt,
     76 )
     77 try:
     78     score = float(response.response)

Solution Attempt I read trough the source code and tried to find a work around, this was my attempt:

...
from ragrank.metric._response_related.relevancy import ResponseRelevancy
from ragrank.metric._context_related.relevancy import ContextRevevancy

resp_r = ResponseRelevancy()
resp_r.llm = ragrank_llm

cont_r = ContextRevevancy()
cont_r.llm = ragrank_llm

result: EvalResult = evaluate(
    dataset=rr_dataset,
    llm=ragrank_llm,
    metrics=[
        resp_r,
        cont_r,
    ],
)

However it yielded another error KeyError: 'token_usage':

File D:\...\.venv\Lib\site-packages\ragrank\metric\_response_related\relevancy.py:74, in ResponseRelevancy.score(self, data)
     72 prompt_str = self.prompt.to_string()
     73 prompt_dt = prompt_str.format(**data.model_dump())
---> 74 response = self.llm.generate_text(
     75     prompt_dt,
     76 )
     77 try:
     78     score = float(response.response)

File D:\...\.venv\Lib\site-packages\ragrank\integrations\langchain\langchain_llm_wrapper.py:97, in LangchainLLMWrapper.generate_text(self, text)
     93 langchain_result: LangchainLLMResult = (
     94     self.llm.generate_prompt(prompts=[prompt])
     95 )
     96 message = langchain_result.generations[0][0].text
---> 97 response_tokens = langchain_result.llm_output["token_usage"][
     98     "completion_tokens"
     99 ]
    100 response_time = time() - start_time
    102 result = LLMResult(
    103     response=message,
    104     response_time=response_time,
   (...)
    107     llm_config=self.llm_config,
    108 )

At that point I don't know what I can try going deeper in the source code (i.e I can't propose a solution), so I point out that problem in hope that you can improve the library. (or tell me what I am doing wrong)

PS: 'ValueError: OPENAI_API_KEY not found in the environmen.' has a typo, it misses the 't';

github-actions[bot] commented 4 weeks ago

Thanks a lot for the first issue posting

izam-mohammed commented 2 weeks ago

Thanks for pointing this out @antoninoLorenzo. I appreciate that you tried to solve it on your own. It is actually a bug in the langchain integration part. it will be resolved in the next release (0.0.8).

I assume that the first problem arose because, in the file evaluation/base.py, there is an import default_llm. it needs to change. and also we are accessing the token_usage property which is only available for a few llms in langchain. Needed to remove the token_usage attribute in langchain integration.

If possible, please raise a PR as a solution.