As a publisher, I want to schedule dataflows inside dpp server so that i don't have to use external applications to run my jobs on schedule.
Acceptance Criteria
[x] Top 10 automated datasets from #264 are running on schedule via dpp server
Tasks
[x] Do analysis
[x] Create repository that runs dpp server (dataflows-server)
[x] Dockerfile
Based off frictionlessdata/datapackage-pipelines:latest
Installs all dependencies
[x] README
[x] .travis.yaml
Use .dockerignore to make sure that the image is small and unwanted files (e.g. actual datapackages) are not inclided there
Alternative to .dockerignore: delete some files in the travis script
[x] git-submodules to all dataset-repos
[x] Deploy service to kubernetes
[x] Update ngnix
[ ] api.datahub.io/dataflows (?)
[x] api.datahub.io/factory
[x] add pipeline specs, options:
[x] (1) Add pipeline spec in each dataset repository
[ ] (2) Create generator that iterates all submodule directories, locates the flow.py file (or how it's called) and yields a matching pipeline spec (with the flow + dump_to_path + push_to_datahub steps)
[x] push_to_datahub processor
[x] Gets datapackage path in parameters and pushes it to DH using data-cli.
[x] Schedule datasets via server
Analysis
This will Standalone service, not depended. that run on kubernetes cluster
How to include all the different repos for in the server. 2 options
create one big repo and include all of them there
Have a repo with submodules of other respos dataset
And that repo with on only Dockerfile aradme, that will build that pushes changes to dockerfile.
As a publisher, I want to schedule dataflows inside dpp server so that i don't have to use external applications to run my jobs on schedule.
Acceptance Criteria
Tasks
Analysis
How to include all the different repos for in the server. 2 options
Probably will go with option 2