frictionlessdata / datapackage-pipelines

Framework for processing data packages in pipelines of modular components.
https://frictionlessdata.io/
MIT License
119 stars 32 forks source link

Processor to find and replace patterns in field(s) #114

Closed zelima closed 6 years ago

zelima commented 6 years ago

As a dpp user, I want to be able to find some strings and replace/remove them, so I don't need to do it manually.

As a dpp user, I want to be able to replace certain patterns from field values so that I'm able to clean my data as I want.

For example one of the field in my data may have anchors for footnotes:

year,country,number,
2000,XXX,12345
2001 (2),XXX,12345

Acceptance Criteria

Tasks

Analysis

Spec:


- 
  run: find_replace
  parameters:
    resources: as always
    fields:
      - 
        name: my_field
        patterns:
          -
            find: ([0-9]{4}) (\(\w+\))
            replace: \1
      - 
        name: my_second_field
        patterns:
          - 
            find: Q1
            replace: '03-31'
          - 
            find: Q2
            replace: '06-31'
          - 
            find: Q3
            replace: '09-30'
          - 
            find: Q4
            replace: '12-31'