epic-open-source / seismometer

AI model evaluation with a focus on healthcare
https://epic-open-source.github.io/seismometer/
BSD 3-Clause "New" or "Revised" License
191 stars 21 forks source link

Example notebooks should put data loading into a single function call. #105

Open gbowlin opened 1 month ago

gbowlin commented 1 month ago

_Originally posted by @diehlbw in https://github.com/epic-open-source/seismometer/pull/99#discussion_r1812702070_

Basically given a url, download all the requisite files based on our general expectations.

gbowlin commented 1 month ago
# Download dataset
import urllib.request
from pathlib import Path

SOURCE_REPO = "epic-open-source/seismometer-data"
BRANCH_NAME = "main"
DATASET_SOURCE = f"https://raw.githubusercontent.com/{SOURCE_REPO}/refs/heads/{BRANCH_NAME}/diabetes-v2"
files = [
    "config.yml",
    "usage_config.yml",
    "data_dictionary.yml",
    "data/predictions.parquet",
    "data/events.parquet",
    "data/metadata.json",
]
Path('data').mkdir(parents=True, exist_ok=True)
for file in files:
    _ = urllib.request.urlretrieve(f"{DATASET_SOURCE}/{file}", file)

Could be minimized to a call like

import seismometer as sm
sm.download_sample_dataset(source_repo="epic-open-source/seismometer-data", branch="main")
diehlbw commented 1 month ago

While not functional, labeling as Important so that we ensure this gets done (and the example cleaned up a little) before v0.3