Open gbowlin opened 1 month ago
# Download dataset
import urllib.request
from pathlib import Path
SOURCE_REPO = "epic-open-source/seismometer-data"
BRANCH_NAME = "main"
DATASET_SOURCE = f"https://raw.githubusercontent.com/{SOURCE_REPO}/refs/heads/{BRANCH_NAME}/diabetes-v2"
files = [
"config.yml",
"usage_config.yml",
"data_dictionary.yml",
"data/predictions.parquet",
"data/events.parquet",
"data/metadata.json",
]
Path('data').mkdir(parents=True, exist_ok=True)
for file in files:
_ = urllib.request.urlretrieve(f"{DATASET_SOURCE}/{file}", file)
Could be minimized to a call like
import seismometer as sm
sm.download_sample_dataset(source_repo="epic-open-source/seismometer-data", branch="main")
While not functional, labeling as Important so that we ensure this gets done (and the example cleaned up a little) before v0.3
_Originally posted by @diehlbw in https://github.com/epic-open-source/seismometer/pull/99#discussion_r1812702070_
Basically given a url, download all the requisite files based on our general expectations.