confident-ai / deepeval

The LLM Evaluation Framework
https://docs.confident-ai.com/
Apache License 2.0
2.96k stars 216 forks source link

Integration of LM Evaluation Harness. #332

Open Anindyadeep opened 9 months ago

Anindyadeep commented 9 months ago

Harness being one of the general evaluation frameworks for hundreds of tasks and benchmarks on different types of metrics.

A general evaluation of LLMs on general tasks is very much important, when doing research and also doing pre-production checks. Some general tasks like toxicity, or evaluating LLMs on summarization etc, is super important.

Since, Harness is very much CLI based, integrating this to deepeval in the form of a modular pipeline, can be super helpful for doing CI checks during fine-tuning LLMs or pre-productions steps.

Expected things:

  1. Terminal based output
  2. DeepEval check
penguine-ip commented 9 months ago

Sounds great, please add it in this module: https://github.com/confident-ai/deepeval/tree/main/deepeval/check. The entry point should be the check function in check.py, let me know if you have any questions!

PS. you might want to put the terminal output logic inside the check function to avoid repetitive code.

Anindyadeep commented 9 months ago

Sounds great, please add it in this module: https://github.com/confident-ai/deepeval/tree/main/deepeval/check. The entry point should be the check function in check.py, let me know if you have any questions!

PS. you might want to put the terminal output logic inside the check function to avoid repetitive code.

sounds good.