Open Superyeahh opened 4 days ago
Are you using the openai models?
你在使用 openai 模型吗?
No, I guess the prompt words for the evaluation indicator given in the code are in English and don't work well for other languages. I translated the answer to the question into English and now the problem is mostly solved.
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question (1) For cases where answer, context and groud_truth are long (about 500 words each), how can the code be modified to make the assessment more effective, currently there are some metrics such as context_recall, faithfulness results are empty (not 0.0). (2) a large number of answer_relevancy value of 0.0 is normal (3) Is it also normal to have only two context_precision values of 0.9999999 and 0.0? Thanks for your answer!