WPRDC / wprdc-etl

MIT License
8 stars 3 forks source link

Start developing extraction processes #9

Closed saylorsd closed 8 years ago

bsmithgall commented 8 years ago

Related to #2. I think the first step here should be to upload some sample data for a few different pipelines so that we can develop the best way to do a full process. I think the full thing should probably look like this:

  1. A new Pipeline is initialized.
  2. It's given a FileExtractor. The file is then streamed line by line. Incoming data is regularized into a
  3. Schema (we can use the marshmallow library for this). These schema are then added into some in-memory data store and finally
  4. Pushed up using the DataPusher, the DataPusher refactored to be an implementation of some loader class.