CivicSpleen / ambry

A comprehensive data package manager
BSD 2-Clause "Simplified" License
4 stars 5 forks source link

Bundle.sql should not be execd in build #163

Closed ericbusboom closed 8 years ago

ericbusboom commented 8 years ago

I've commented out https://github.com/CivicKnowledge/ambry/blob/master/ambry/bundle/bundle.py#L1780 in my next commit, because it's causing problems. Partly from the requirement for apsw ( which should only be required when using a Sqlite warehouse ).

When building with a generator that references an SQL query, build should always be done with an Sqlite warehouse that is specific to the build. So, the warehouse should be constructed before the start of iteration of the first source that uses a SQL query.

THe best thing to do is probably to create a new etl.pipeline.SqlSourcePipe, similar to etl.pipeline.GeneratorSourcePipe. When the SqlSourcePipe is initialized, look for a warehouse file in the build build directory ( bundle.build_fs ) and create it if it does not exist. Then exec the bundle.sql file and load it any required partitions.

In the warehouse, save the last modified times for the bundle.sql file, and re-generate the warehouse if the bundle.sql file has changed. Since the only other thing that does into the warehouse is the partitions, and those are all referenced from outside, there should be no need to alter the warehouse unless the bundle.sql file changes.

nmb10 commented 8 years ago

Done. Now it executes before ingest.