CivicSpleen / ambry

A comprehensive data package manager
BSD 2-Clause "Simplified" License
4 stars 5 forks source link

Loading SQL Sources #184

Closed ericbusboom closed 8 years ago

ericbusboom commented 8 years ago

Some improvements to the design for SQL sources.

Distinguish build from production warehouses

There are two main use cases for SQL: using SQL queries to build a bundle, and using SQL in a warehouse setting, such as with a web application.

If the library is Sqlite, the warehouse is Sqlite, stored in the bundle's build directory

If the library is Postgres, the warehouse is postgres,

Loading bundle.sql, define sources

Create comments that indicate which source SQL statements are to be applied to. The source comment is stateful -- all of the SQL statements after it are associated with the previous source statement. All of the SQL statements for a source are stored with the source record ( as a list of statements in the data field ) . SQL placed before a source comment are collected together and stored outside of a source, maybe as a bundle config.

When building or ingesting, the bundle first executes the SQL statements that are not affiliated with a source. Then, just before a SQL source is executed, the source's SQL statements are executed.

Initial Partitions

Before the Sql is first loaded, the database is initialized with FDW for all of the partitions referenced in sources.csv. This established the initial partitions that later SQL entries can work from.