CivicSpleen / ambry

A comprehensive data package manager
BSD 2-Clause "Simplified" License
4 stars 5 forks source link

Finalize Partition to Partition, copy into bundle. #57

Closed ericbusboom closed 9 years ago

ericbusboom commented 9 years ago

To support more distributed operation, make partitions more self contained. This will allow the partition to be completely built on a remote machine or process, without interacting with the bundle.

When the partition is created, it get the schema for its table ( so partition.table references itself, not the bundle ) and a zip file of the while bundle, probably ziped and compressed into a file record. Then, a process that is operating on the partition does not need to reference the bundle database at all, it can work entirely off os the partition.

This will be particularly valuable for distributed operation. The initialized partition can be written to S3, and a message created to trigger building. Then, a distributed build system can work on just that bundle, writing the result back to S3.