A filter plugin for Embulk to filter out columns
name
)default
)default_timestamp_format
)default_timezone
)src
or default
is required)src
or default
is required)default
)default_timestamp_format
)default_timezone
)%Y-%m-%d %H:%M:%S.%N %z
)UTC
)Say input.csv is as follows:
time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
- type: column
columns:
- {name: time, default: "2015-07-13", format: "%Y-%m-%d"}
- {name: id}
- {name: key, default: "foo"}
reduces columns to only time
, id
, and key
columns as:
time,id,key
2015-07-13,0,Vqjht6YE
2015-07-13,1,VmjbjAA0
2015-07-13,2,C40P5H1W
Note that column types are automatically retrieved from input data (inputSchema).
Say input.csv is as follows:
time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
- type: column
add_columns:
- {name: d, type: timestamp, default: "2015-07-13", format: "%Y-%m-%d"}
- {name: copy_id, src: id}
add d
column, and copy_id
column which is a copy of id
column as:
time,id,key,score,d,copy_id
2015-07-13,0,Vqjht6YE,1370,2015-07-13,0
2015-07-13,1,VmjbjAA0,3962,2015-07-13,1
2015-07-13,2,C40P5H1W,7323,2015-07,13,2
Say input.csv is as follows:
time,id,key,score
2015-07-13,0,Vqjht6YE,1370
2015-07-13,1,VmjbjAA0,3962
2015-07-13,2,C40P5H1W,7323
filters:
- type: column
drop_columns:
- {name: time}
- {name: id}
drop time
and id
columns as:
key,score
Vqjht6YE,1370
VmjbjAA0,3962
C40P5H1W,7323
For type: json column, you can specify JSONPath for column's name as:
- {name: $.payload.key1}
- {name: "$.payload.array[0]"}
- {name: "$.payload.array[*]"}
- {name: $['payload']['key1.key2']}
EXAMPLE:
Following operators of JSONPath are not supported:
['name','name']
[1,2]
[1:2]
[?(<expression>)]
Note that type: timesatmp
for add_columns
or columns
is not available because Embulk's type: json
cannot have timestamp column inside.
Also note that renameing or copying of json paths by src
option is only partially supported yet. The parent json path must be same like:
- {name: $.payload.foo.dest, src: $.payload.foo.src}
I mean that below example does not work yet ($.payload.foo
and $.payload.bar
)
- {name: $.payload.foo.dest, src: $.payload.bar.src}
Run example:
$ ./gradlew gem
$ embulk preview -I build/gemContents/lib example/example.yml
Run test:
$ ./gradlew test
Run test with coverage reports:
$ ./gradlew test jacocoTestReport
open build/reports/jacoco/test/html/index.html
Run checkstyle:
$ ./gradlew check
Run only checkstyle:
$ ./gradlew checkstyleMain
$ ./gradlew checkstyleTest
Modify version
in build.gradle
at a detached commit, and then tag the commit with an annotation.
git checkout --detach master
(Edit: Remove "-SNAPSHOT" in "version" in build.gradle.)
git add build.gradle
git commit -m "Release vX.Y.Z"
git tag -a vX.Y.Z
(Edit: Write a tag annotation in the changelog format.)
See Keep a Changelog for the changelog format. We adopt a part of it for Git's tag annotation like below.
## [X.Y.Z] - YYYY-MM-DD
### Added
- Added a feature.
### Changed
- Changed something.
### Fixed
- Fixed a bug.
Push the annotated tag, then. It triggers a release operation on GitHub Actions after approval.
git push -u origin vX.Y.Z