Store pipeline output in compressed CSVs and read/write them using `data.table`

nmdefries commented 11 months ago

Closes https://github.com/cmu-delphi/forecast-eval/issues/262

Switch to csv.gz format instead of using RDS. CSV is more flexible in what R packages we can use to read/write the files, and in what languages we want to use (e.g. if we want to rewrite the pipeline in Python but keep the dashboard in R).

fread only speeds up the dashboard slightly, but offers the opportunity to use faster data.table processing in the future.

dsweber2 commented 11 months ago

so if I'm reading this right, to test it, I should run make score_forecast and make build_dashboard_dev?

nmdefries commented 11 months ago

make score_forecast to test the pipeline and make start_dashboard to test the dashboard (build_dashboard_dev is run as a dependency).

make score_forecast depends on an image repo download, a workaround is given as

# `docker_build/Dockerfile` is based on `ghcr.io/cmu-delphi/covidcast:latest`.
# Docker will try to fetch it from the image repository, which requires
# authentication. As a workaround, locally build a docker image
# from https://github.com/cmu-delphi/covidcast-docker/ using the `make build`
# target, and set `--pull=false` below.

That said, given the difficulties of testing this, I'll run the pipeline in GitHub Actions so that the score files are available in the S3 bucket (won't impact the public dashboard since the new extension makes the score file names different), and you can test the dashboard only. I'll let you know when that's done.

cmu-delphi / forecast-eval

Store pipeline output in compressed CSVs and read/write them using `data.table` #308