Add support for models with arbitrary dspy programs

A script (cli) which uses benchmark.py as a import (not as a cli).

# benchmark_medprompt.py

import argparse
from benchmark import test
from medprompt import MedpromptModule

# class MedpromptModule(dspy.Module):
#   def __init__(self):
#     ...
# 
#  def forward(self, ...):
#     ...

if __name__ == "__main__":
  ... some argparse code ...

  results = test(
    dspy_module=MedpromptModule
    benchmark="openllm"
  )

  print(results)

Both a prompt testing script (like medprompt.py, etc) and the benchmark.py can be used as cli scripts due to the code in their if __name__ == ... blocks as well as imported files.

Technoculture / med-llm-autoeval

Add support for models with arbitrary dspy programs #3