Closed YWen-AI closed 3 months ago
Hey @wywdiablo , this is interesting. We worked on a similar idea during the paper but couldn't quantify the difference effectively. How do you think this metric will help in development?
@shahules786 I believe this metric will be important for certain use cases. For example, in one case we aim to enhance the answer by integrating the LLM's knowledge with a new external data source. A challenge arises when the RAG forms an answer based solely on the data we provided. If we offer a different prompt template in the RAG, such as, "You are a professor in the XXX domain," it seems only activate the LLM's inherent knowledge rather than drawing from the external data source. There should be a metric to inform the developer of this, allowing for better control.
It would be beneficial to have an evaluation metric that measures the improvement brought by the RAG. This metric should perform the following:
Inputs required: