Is your feature request related to a problem? Please describe.
So right now, deepeval package has different types of checks. Example:
FactualConsistency check
conceptual similarity
RAG check (using ragas)
However, most of these checks are/will be using some form of atomic or granular metrics (for example: accuracy, f1, etc). Hence, we need to have some function or set of functions that define those metrics at a granular level. So, if the user wants to make a CustomMetric (which requires some statistical metrics), they can use it from deepeval itself. So, in that way we can have more modularity and hence more extensible the package will be.
Describe the solution you'd like
The solution that I have in my mind, is to have a separate class called MetricsCalculation or something that will contain all the set of metrics. But, as far as I have seen in NLP, Metrics can be broadly categorized into two forms
Metrics that are statistical in nature (which uses some kind of mathematical formula to calculate some score). We can term it as StatisticalMetrics. Example includes:
a. BLEU score
b. Rouge score
c. Exact match score, etc.
Metrics which uses an external ML/DL model to calculate the score. We can term it as ModelBasedMetrics. Example includes
a. BERT score
b. PII score
c. Toxicity score, etc.
The taxonomy of metrics varies greatly, however, personally in my opinion, this simple classification can be a great point of start (in terms of implementation).
Once implemented, these metrics can be readily used in other implementations. For example: SummarizationMetrics as mentioned in issue #179
Is your feature request related to a problem? Please describe.
So right now, deepeval package has different types of checks. Example:
However, most of these checks are/will be using some form of atomic or granular metrics (for example: accuracy, f1, etc). Hence, we need to have some function or set of functions that define those metrics at a granular level. So, if the user wants to make a CustomMetric (which requires some statistical metrics), they can use it from deepeval itself. So, in that way we can have more modularity and hence more extensible the package will be.
Describe the solution you'd like
The solution that I have in my mind, is to have a separate class called
MetricsCalculation
or something that will contain all the set of metrics. But, as far as I have seen in NLP, Metrics can be broadly categorized into two formsMetrics that are statistical in nature (which uses some kind of mathematical formula to calculate some score). We can term it as
StatisticalMetrics
. Example includes:a. BLEU score b. Rouge score c. Exact match score, etc.
Metrics which uses an external ML/DL model to calculate the score. We can term it as
ModelBasedMetrics
. Example includesa. BERT score b. PII score c. Toxicity score, etc.
The taxonomy of metrics varies greatly, however, personally in my opinion, this simple classification can be a great point of start (in terms of implementation).
Once implemented, these metrics can be readily used in other implementations. For example:
SummarizationMetrics
as mentioned in issue #179