frictionlessdata / tabulator-py

Python library for reading and writing tabular data via streams.
https://frictionlessdata.io
MIT License
236 stars 42 forks source link

Replace #227

Closed AcckiyGerman closed 6 years ago

AcckiyGerman commented 6 years ago

As a pipeline user I want to be able to find some strings and replace them automatically when processing data, so I don't need to do it manually.

As a tabulator-py user I want to find and replace strings in my data, using one simple parameter, so that I don't need to write a post-processor script.

dataset examples where we need a replace ability

https://github.com/datasets/cash-surplus-deficit/blob/master/scripts/process.py [lines 27,28]
https://github.com/datasets/co2-ppm/blob/master/scripts/process.sh [line 42]
https://github.com/datasets/gdp-us/blob/master/scripts/process.py [lines 36-45]

Analyse

replace format

Tasks

Original: https://github.com/AcckiyGerman/tabulator-py/issues/2

pwalsh commented 6 years ago

@AcckiyGerman this is very out of scope for the Stream constructor. The post_parse API is designed for these types of use cases - please use it.

@roll I recommend we close this, if you agree.

AcckiyGerman commented 6 years ago

Yes, I agree - the stream constructor is wrong place for such a function :+1:

roll commented 6 years ago

@AcckiyGerman You could always have a set of tabulator processors inside your own project. Also it could be released on PyPi as a separate module providing some set of processors for tabulator. We could check on enabling plugin system if you will be interested. We have a working example for tableschema-py.

AcckiyGerman commented 6 years ago

@roll @pwalsh Thanks for replies! I'm still learning the frictionless code infrastructure with the aim to use pipelines. I would try to use processor plugins in the pipeline conf file. If I meet any troubles I'll ask in your gitter channel :)