Technoculture / med-llm-autoeval

Automatically evaluate your LLMs in Google Colab
MIT License
3 stars 1 forks source link

Add support for models with arbitrary dspy programs #3

Closed sutyum closed 8 months ago

sutyum commented 9 months ago

Use benchmark.py to test various model + DSPy program combiantions

sutyum commented 8 months ago

A script (cli) which uses benchmark.py as a import (not as a cli).

# benchmark_medprompt.py

import argparse
from benchmark import test
from medprompt import MedpromptModule

# class MedpromptModule(dspy.Module):
#   def __init__(self):
#     ...
# 
#  def forward(self, ...):
#     ...

if __name__ == "__main__":
  ... some argparse code ...

  results = test(
    dspy_module=MedpromptModule
    benchmark="openllm"
  )

  print(results)

Both a prompt testing script (like medprompt.py, etc) and the benchmark.py can be used as cli scripts due to the code in their if __name__ == ... blocks as well as imported files.