embulk / embulk-base-restclient

Base class library for Embulk plugins to access RESTful services
https://www.embulk.org/
Apache License 2.0
6 stars 7 forks source link

[Bug Report] JSON array is not supported in JSON column #136

Open calorie opened 3 years ago

calorie commented 3 years ago

Environment

Reproduction

https://github.com/calorie/embulk-repro

$ docker-compose run --rm embulk run json_to_es.yml
Creating embulk-repro_embulk_run ... done
2021-01-07 10:50:06.065 +0000: Embulk v0.9.23
2021-01-07 10:50:07.328 +0000 [WARN] (main): DEPRECATION: JRuby org.jruby.embed.ScriptingContainer is directly injected.
2021-01-07 10:50:10.758 +0000 [INFO] (main): Gem's home and path are set by default: "/root/.embulk/lib/gems"
2021-01-07 10:50:11.828 +0000 [INFO] (main): Started Embulk v0.9.23
2021-01-07 10:50:12.009 +0000 [INFO] (0001:transaction): Loaded plugin embulk-output-elasticsearch (0.4.7)
2021-01-07 10:50:12.065 +0000 [INFO] (0001:transaction): Loaded plugin embulk-parser-jsonl (0.2.1)
2021-01-07 10:50:12.096 +0000 [INFO] (0001:transaction): Listing local files at directory '.' filtering filename by prefix 'json_payload.json'
2021-01-07 10:50:12.098 +0000 [INFO] (0001:transaction): "follow_symlinks" is set false. Note that symbolic links to directories are skipped.
2021-01-07 10:50:12.118 +0000 [INFO] (0001:transaction): Loading files [./json_payload.json]
2021-01-07 10:50:12.162 +0000 [INFO] (0001:transaction): Using local thread executor with max_threads=8 / output tasks 4 = input tasks 1 * 4
2021-01-07 10:50:12.191 +0000 [INFO] (0001:transaction): Logging initialized @6740ms
2021-01-07 10:50:12.681 +0000 [INFO] (0001:transaction): Connecting to Elasticsearch version:7.10.1
2021-01-07 10:50:12.681 +0000 [INFO] (0001:transaction): Executing plugin with 'replace' mode.
2021-01-07 10:50:12.715 +0000 [INFO] (0001:transaction): Inserting data into index[test_20210107-105011]
2021-01-07 10:50:12.724 +0000 [INFO] (0001:transaction): {done:  0 / 1, running: 0}
2021-01-07 10:50:12.869 +0000 [INFO] (0001:transaction): {done:  1 / 1, running: 0}
org.embulk.exec.PartialExecutionException: org.embulk.spi.DataException: Expected object node: [{"b":1}]
        at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:340)
        at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:566)
        at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:35)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:353)
        at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:350)
        at org.embulk.spi.Exec.doWith(Exec.java:22)
        at org.embulk.exec.BulkLoader.run(BulkLoader.java:350)
        at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:242)
        at org.embulk.EmbulkRunner.runInternal(EmbulkRunner.java:291)
        at org.embulk.EmbulkRunner.run(EmbulkRunner.java:155)
        at org.embulk.cli.EmbulkRun.runSubcommand(EmbulkRun.java:431)
        at org.embulk.cli.EmbulkRun.run(EmbulkRun.java:90)
        at org.embulk.cli.Main.main(Main.java:64)
Caused by: org.embulk.spi.DataException: Expected object node: [{"b":1}]
        at org.embulk.base.restclient.jackson.StringJsonParser.parseJsonObject(StringJsonParser.java:31)
        at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.jsonColumn(JacksonAllInObjectScope.java:115)
        at org.embulk.spi.Column.visit(Column.java:56)
        at org.embulk.spi.Schema.visitColumns(Schema.java:68)
        at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:47)
        at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
        at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
        at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
        at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
        at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
        at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:351)
        at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:291)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Error: org.embulk.spi.DataException: Expected object node: [{"b":1}]

json_to_es.yml

in:
  type: file
  path_prefix: ./repro.json
  parser:
    type: json
    columns:
    - {name: a, type: json}
out:
  type: elasticsearch
  mode: replace
  nodes:
  - {host: elasticsearch, port: 9200}
  index: test
  index_type: test

repro.json

{"a": [{"b": 1}]}

Expected

I want to use JSON array in JSON data type. It's necessary to use parseJsonArray here:

https://github.com/embulk/embulk-base-restclient/blob/6af1941d999d2f06b0de118e2b4a4e815d6d805c/src/main/java/org/embulk/base/restclient/jackson/scope/JacksonAllInObjectScope.java#L103-L113

Ref: Twitter (Japanese text)

@hiroyuki-sato Thank you for supporting me.

hiroyuki-sato commented 3 years ago

@dmikurube Could you take a look when you get a chance?

For future testing.

a,b,c,d
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
1,"\"test\"","{ \"a\":\"abc\" }","[1,2,3]",true,null
in:
  type: file
  path_prefix: test.csv
  parser:
    charset: UTF-8
    newline: LF
    type: csv
    delimiter: ','
    quote: '"'
    escape: \
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    columns:
    - {name: a, type: json}
    - {name: b, type: json}
    - {name: c, type: json}
    - {name: d, type: json}
    - {name: e, type: json}
    - {name: f, type: json}
out: {type: stdout}