Open kaan9700 opened 2 weeks ago
Yes, it is normal for the individual values of the Context Precision metric to always be binary (0 or 1) for each data point. This is confirmed by the ContextPrecisionVerification
class, which defines the verdict
field as a binary (0/1) value:
class ContextPrecisionVerification(BaseModel):
"""Answer for the verification task whether the context was useful."""
reason: str = Field(..., description="Reason for verification")
verdict: int = Field(..., description="Binary (0/1) verdict of verification")
Additionally, the prompt for the CONTEXT_PRECISION
metric instructs to give a verdict as "1" if useful and "0" if not:
CONTEXT_PRECISION = Prompt(
name="context_precision",
instruction="""Given question, answer and context verify if the context was useful in arriving at the given answer. Give verdict as "1" if useful and "0" if not with json output.""",
...
)
The Context Precision metric evaluates whether all of the ground-truth relevant items present in the contexts
are ranked higher or not. This metric is computed using the question
, ground_truth
, and the contexts
, with values ranging between 0 and 1, where higher scores indicate better precision. The relevance indicator at each rank, denoted as $v_k$, is binary (0 or 1), confirming that individual values are indeed binary [1][2].
To continue talking to Dosu, mention @dosu.
I have noticed that if you look at the evaluated data set, the metric is determined individually for each data point of the dataset. The output of the console is then the mean value of all calculated metrics. With Context Precision, I have noticed that the individual values (values per data point) are always binary, i.e. 0 or 1. Is this normal? Is there only this range of values for the individual data points, or have I made a mistake?
The whole thing can be reproduced if you use the small test set from the RAGAS documentation and save the generated dataframe as a CSV.