A metrics class containing different definitions of metrics ordered by some taxonomy.

Anindyadeep commented 1 year ago

Is your feature request related to a problem? Please describe.

So right now, deepeval package has different types of checks. Example:

FactualConsistency check
conceptual similarity
RAG check (using ragas)

However, most of these checks are/will be using some form of atomic or granular metrics (for example: accuracy, f1, etc). Hence, we need to have some function or set of functions that define those metrics at a granular level. So, if the user wants to make a CustomMetric (which requires some statistical metrics), they can use it from deepeval itself. So, in that way we can have more modularity and hence more extensible the package will be.

Describe the solution you'd like

The solution that I have in my mind, is to have a separate class called MetricsCalculation or something that will contain all the set of metrics. But, as far as I have seen in NLP, Metrics can be broadly categorized into two forms

Metrics that are statistical in nature (which uses some kind of mathematical formula to calculate some score). We can term it as StatisticalMetrics. Example includes:

a. BLEU score b. Rouge score c. Exact match score, etc.
Metrics which uses an external ML/DL model to calculate the score. We can term it as ModelBasedMetrics. Example includes

a. BERT score b. PII score c. Toxicity score, etc.

The taxonomy of metrics varies greatly, however, personally in my opinion, this simple classification can be a great point of start (in terms of implementation).

Once implemented, these metrics can be readily used in other implementations. For example: SummarizationMetrics as mentioned in issue #179

Anindyadeep commented 1 year ago

@penguine-ip hey, can you assign all the above Metric Request issues to me.

Thanks

penguine-ip commented 1 year ago

@Anindyadeep all done, let me know if i missed any

Anindyadeep commented 1 year ago

Awesome, thanks @penguine-ip :)

confident-ai / deepeval

A metrics class containing different definitions of metrics ordered by some taxonomy. #238