Related to #2. I think the first step here should be to upload some sample data for a few different pipelines so that we can develop the best way to do a full process. I think the full thing should probably look like this:
A new Pipeline is initialized.
It's given a FileExtractor. The file is then streamed line by line. Incoming data is regularized into a
Schema (we can use the marshmallow library for this). These schema are then added into some in-memory data store and finally
Pushed up using the DataPusher, the DataPusher refactored to be an implementation of some loader class.
Related to #2. I think the first step here should be to upload some sample data for a few different pipelines so that we can develop the best way to do a full process. I think the full thing should probably look like this:
Pipeline
is initialized.FileExtractor
. The file is then streamed line by line. Incoming data is regularized into aSchema
(we can use the marshmallow library for this). These schema are then added into some in-memory data store and finallyDataPusher
, the DataPusher refactored to be an implementation of some loader class.