It seems that the "except" clause here can cause unexpected results since the evaluation of the probabilities can fail, but the code will still return a score.
In my opinion it would be good to give the user of the metric an ability to choose between:
logprobs sum mode.
top token mode (for cases where the logprobs vector is not transparent in the evaluator model, or if the user simply doesn't care about the quality guarantee in the paper, which is specified for the probability sum method).
I understand that in some cases the evaluator model doesn't give access to the logprobs vector. In that case, if the user selected logprobs sum mode I would like them to see an error explaining that the probabilities are not available, and to use top token mode instead.
If there is agreement amongst the core team, I would be happy to contribute this functionality
https://github.com/confident-ai/deepeval/blob/e49714ddd0fe523d27ef4c8b2477eeaeb7128ef2/deepeval/metrics/g_eval/g_eval.py#L315
It seems that the "except" clause here can cause unexpected results since the evaluation of the probabilities can fail, but the code will still return a score.
In my opinion it would be good to give the user of the metric an ability to choose between:
I understand that in some cases the evaluator model doesn't give access to the logprobs vector. In that case, if the user selected logprobs sum mode I would like them to see an error explaining that the probabilities are not available, and to use top token mode instead.
If there is agreement amongst the core team, I would be happy to contribute this functionality