Blurb: Biomedical Language Understanding and Reasoning Benchmark
Final code involves a CLI for testing a given model on a given list of evals. This CLI is called by the runpod.sh program. Which in turn is called by the runpod template after the GPU instance boots up.
Finally in a colab notebook if the following sort of code runs to completion, then we test it with runpod in debug mode.
Python code
Medical Competence
Reasoning
Final code involves a CLI for testing a given model on a given list of evals. This CLI is called by the
runpod.sh
program. Which in turn is called by the runpod template after the GPU instance boots up.Finally in a colab notebook if the following sort of code runs to completion, then we test it with runpod in debug mode.