Technoculture / med-llm-autoeval

Automatically evaluate your LLMs in Google Colab
MIT License
3 stars 1 forks source link

Support for dspy programs #4

Closed dkshjn closed 8 months ago

dkshjn commented 8 months ago

closes #3

sutyum commented 8 months ago

A script (cli) which uses benchmark.py as a import (not as a cli).

Inside the repo with a dspy program (such as medprompt):

pip install git+https://github.com/Technoculture/med-llm-autoeval
# benchmark_medprompt.py

import argparse
from benchmark import test
from medprompt import MedpromptModule

# class MedpromptModule(dspy.Module):
#   def __init__(self):
#     ...
# 
#  def forward(self, ...):
#     ...

if __name__ == "__main__":
  ... some argparse code ...

  results = test(
    dspy_module=MedpromptModule
    benchmark="openllm"
  )

  print(results)

Both a prompt testing script (like medprompt.py, etc) and the benchmark.py can be used as cli scripts due to the code in their if __name__ == ... blocks as well as imported files.

sutyum commented 8 months ago
sutyum commented 8 months ago

Share a loom showing the script working

dkshjn commented 8 months ago

The following video explains the process. https://www.loom.com/share/6c7e4eed82764d31b4bf4a6a859ac295?sid=160b0889-417f-4604-a758-5488df2b10e1