droher / boxball

Prebuilt Docker images with Retrosheet's complete baseball history data for many analytical frameworks. Includes Postgres, cstore_fdw, MySQL, SQLite, Clickhouse, Drill, Parquet, and CSV.
Apache License 2.0
117 stars 16 forks source link

Separate data files from image #49

Open droher opened 4 years ago

droher commented 4 years ago

The data files are currently baked into the images for each database, which is not ideal because they're a single huge layer and Docker can't share it between images even though the data is the same. Instead, the data should be left out of each image, and downloaded when the container starts. The download should first go to some general shareable volume between images, so that it gets cached and only has to be downloaded once across multiple images.

As a prerequisite i'll need to add the data upload to OneDrive/wherever as part of the build instead of something I do manually.