CivicSpleen / ambry

A comprehensive data package manager
BSD 2-Clause "Simplified" License
4 stars 5 forks source link

Process Improvements #156

Closed ericbusboom closed 8 years ago

ericbusboom commented 8 years ago

Improve the build process.

The build process should be self-contained, and not require running the ingest and schema phases first.

Ingest and schema should only need to be run during development.

Ingest to load the files and create statistics. Then, run the schema processes. One to create source schemas, one for the destination schemas.

The source schemas are optional, only required if there are fixed width files or columns that need new names. The source schemas are built from the ingested files.

Not all of the files need to be ingested -- just enough the create schemas.

The build process is self contained. It starts with the source files, not the ingested files