datahq / dataflows

DataFlows is a simple, intuitive lightweight framework for building data processing flows in python.
https://dataflows.org
MIT License
195 stars 40 forks source link

suggested features: dataflows CLI + auto-numbered checkpoints #48

Closed OriHoch closed 5 years ago

OriHoch commented 5 years ago

Example shell session using suggested features for dataflows CLI and checkpoints:

$ dataflows load "/foo/bar/datapackage.json" | dataflows ./my-flow.py:my_step "arg_a" "arg_b" | dataflows printer
FOO | BAR
-------|-------
aaa  | ccc
^^^^^^^^^^^

$ dataflows ./my-flow.py:my_other_step | dataflows checkpoint
Saved checkpoint 1

$ dataflows checkpoint 1 | dataflows join --source_name=foo --source_key='["my_id"]' --source_delete=false --target_name=bar --target_key='["my_id"] --fields='{"baz": {}}' | dataflows checkpoint
Loading from checkpoint 1
Saving to checkpoint 2

$ dataflows checkpoint last | dataflows printer
Loading from checkpoint 2
FOO | BAR
-------|-------
aaa  | ccc
^^^^^^^^^^^

Could be used to support integration with singer (#16)

$ dataflows singer exchangerates --coin=BTC | dataflows printer
Date | Coin | Rate
------------------------
2017 | BTC | 5000$
2018 | BTC | 20000$
2019 | BTC | 5$
OriHoch commented 5 years ago

implementation option

OriHoch commented 5 years ago
$ dataflows --help
.
.
Processors:
  load <SOURCE>  Load a resource
  dump_to_path <TARGET> save a resource
  .
  .
OriHoch commented 5 years ago

Example usage for DevOps / automation and dataflows possible arguments

$ pip install dataflows-kubernetes
$ dataflows --printer --checkpoint kubernetes.get:pods --label=ckan --all-namespaces
Saving checkpoint 1
pod_name | Phase | ... 
------------------------------

# -pc = --printer --checkpoint
$ dataflows --load checkpoint:1 -pc filter_rows --not_equals={"Phase":"Running"}
Loading checkpoint 1
Saving checkpoint 2
pod_name | Phase | ... 
---------------- Pending -----

$ dataflows --load checkpoint:2 -pc kubernetes.delete:pods --all-namespaces
Loading checkpoint 2
Deleting pod FOOBAR-iojo1ij12-333...
OriHoch commented 5 years ago

implemented here