Closed AcckiyGerman closed 6 years ago
@AcckiyGerman this is very out of scope for the Stream constructor. The post_parse
API is designed for these types of use cases - please use it.
@roll I recommend we close this, if you agree.
Yes, I agree - the stream constructor is wrong place for such a function :+1:
@AcckiyGerman
You could always have a set of tabulator
processors inside your own project. Also it could be released on PyPi as a separate module providing some set of processors for tabulator
. We could check on enabling plugin system if you will be interested. We have a working example for tableschema-py
.
@roll @pwalsh Thanks for replies! I'm still learning the frictionless code infrastructure with the aim to use pipelines. I would try to use processor plugins in the pipeline conf file. If I meet any troubles I'll ask in your gitter channel :)
As a pipeline user I want to be able to find some strings and replace them automatically when processing data, so I don't need to do it manually.
As a tabulator-py user I want to find and replace strings in my data, using one simple parameter, so that I don't need to write a post-processor script.
dataset examples where we need a replace ability
https://github.com/datasets/cash-surplus-deficit/blob/master/scripts/process.py [lines 27,28]
https://github.com/datasets/co2-ppm/blob/master/scripts/process.sh [line 42]
https://github.com/datasets/gdp-us/blob/master/scripts/process.py [lines 36-45]
Analyse
replace format
I will use a dictionary to pass agruments into the
Stream
constructor, so later we could extend 'replace' with more keywords:replace={'old': 'q1', 'new': '-03-31'}
For several replacements we could use a list of dictionaries:
[ ] [+2h] To enable RegExp we could pass
'regex': True
. Here's a real example from the script, that need an automation:replace('\.|"|,|\'|:|-|\(|\)', '', regex=True)
. In such case the parameter will look like that:[ ] [+2h] We could specify column to apply replace function to. (e.g we need to replace 'q1' to '-03-31' but there could be 'q1' in other columns, that we are not want to break)
Tasks
replace: "old", "new"
[20m]Original: https://github.com/AcckiyGerman/tabulator-py/issues/2