add validate metrics - Githubissues

Here are some metrics related to the answer groundness.

[ ] Knowledge F1. A lexical overlap metric used for knowledge-grounded dialogue, which checks the F1 score between the tokens of gold passages and model responses.
[ ] Knowledge F1 ++. A variant of K-F1 that discounts tokens from user question or the conversation history in the model response.
[ ] Faithfulness (RAGAS). Use LLM to extract the statements in the model response, and then determines whether these statements can be inffered from the given contexts.
[ ] FActScore. A LLM-based method that breaks down the generated text into a series of atom facts, and then evaluates whether these facts are supported by the knowledge source.
[ ] QUIP-Score. An n-gram overlap measure that quantifies the degree to which a generated passage consists of exact spans found in a text corpus.

gomate-community / rageval