Technoculture / med-llm-autoeval

Automatically evaluate your LLMs in Google Colab
MIT License
3 stars 1 forks source link

Add medical evals #1

Closed sutyum closed 9 months ago

sutyum commented 9 months ago

Python code

Medical Competence

  1. MedMCQA
  2. PubMedQA
  3. MedQA
  4. MedicationQA
  5. MMLU Medical

Reasoning

  1. ARC
  2. HellaSwag
  3. Winogrande
  4. GSM8K
  5. TruthfulQA
  6. Blurb: Biomedical Language Understanding and Reasoning Benchmark

Final code involves a CLI for testing a given model on a given list of evals. This CLI is called by the runpod.sh program. Which in turn is called by the runpod template after the GPU instance boots up.


Finally in a colab notebook if the following sort of code runs to completion, then we test it with runpod in debug mode.

!export DEBUG=DEBUG
!export GITHUB_API_TOKEN=GITHUB_API_TOKEN
...

!git clone https://github.com/Technoculture/med-llm-autoeval.git
!bash ./med-llm-autoeval/runpod.sh
sutyum commented 9 months ago
huggingface-cli login --token $HF_TOKEN