Open Anindyadeep opened 9 months ago
Sounds great, please add it in this module: https://github.com/confident-ai/deepeval/tree/main/deepeval/check. The entry point should be the check function in check.py, let me know if you have any questions!
PS. you might want to put the terminal output logic inside the check function to avoid repetitive code.
Sounds great, please add it in this module: https://github.com/confident-ai/deepeval/tree/main/deepeval/check. The entry point should be the check function in check.py, let me know if you have any questions!
PS. you might want to put the terminal output logic inside the check function to avoid repetitive code.
sounds good.
Harness being one of the general evaluation frameworks for hundreds of tasks and benchmarks on different types of metrics.
A general evaluation of LLMs on general tasks is very much important, when doing research and also doing pre-production checks. Some general tasks like toxicity, or evaluating LLMs on summarization etc, is super important.
Since, Harness is very much CLI based, integrating this to deepeval in the form of a modular pipeline, can be super helpful for doing CI checks during fine-tuning LLMs or pre-productions steps.
Expected things: