embulk / embulk-base-restclient

Base class library for Embulk plugins to access RESTful services
https://www.embulk.org/
Apache License 2.0
6 stars 7 forks source link

Check isNull(column) at SinglePageRecordReader to avoid ArrayIndexOutOfBoundsException #86

Closed sakama closed 7 years ago

sakama commented 7 years ago

I'm using embulk-base-resetclient with embulk-output-elasticsearch.

I sometimes got ArrayIndexOutOfBoundsException while exporting to Es server. I'm using embulk-base-resetclient v0.4.2 for now, but confirmed that also happens with v0.5.1. It seems that failure happens when msgpack formatted file contains empty String column.

This problem never happens when source file is (JSON|CSV), only happens with MessagePack formatted files. Then, I may need to fix embulk-parser-msgpack.

Stacktrace

org.embulk.exec.PartialExecutionException: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:373) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:591) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:33) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:389) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:385) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.Exec.doWith(Exec.java:25) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.BulkLoader.run(BulkLoader.java:385) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:180) ~[embulk-core-0.8.18.jar:na]
  at com.treasuredata.worker.calls.EmbulkResult.run(EmbulkResult.java:69) ~[td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.7.0_80]
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[na:1.7.0_80]
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.7.0_80]
  at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_80]
  at com.treasuredata.java_call.JavaCall.processRequest(JavaCall.java:147) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at com.treasuredata.java_call.JavaCall.run(JavaCall.java:111) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
  at com.treasuredata.worker.JavaCallMain.main(JavaCallMain.java:21) [td-worker-calls-0.1.0-SNAPSHOT.jar:0.1.0-SNAPSHOT]
Caused by: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at java.util.Arrays$ArrayList.get(Arrays.java:2866) ~[na:1.7.0_80]
  at org.embulk.spi.Page.getStringReference(Page.java:53) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageReader.getString(PageReader.java:111) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageReader.getString(PageReader.java:105) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.base.restclient.record.SinglePageRecordReader.getString(SinglePageRecordReader.java:67) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.stringColumn(JacksonAllInObjectScope.java:62) ~[na:na]
  at org.embulk.spi.Column.visit(Column.java:58) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.Schema.visitColumns(Schema.java:81) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17) ~[na:na]
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9) ~[na:na]
  at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14) ~[na:na]
  at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18) ~[na:na]
  at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43) ~[na:na]
  at org.embulk.spi.PageBuilder.doFlush(PageBuilder.java:244) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageBuilder.flush(PageBuilder.java:250) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.PageBuilder.addRecord(PageBuilder.java:227) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.parser.msgpack.MsgpackParserPlugin.run(MsgpackParserPlugin.java:280) ~[na:na]
  at org.embulk.spi.FileInputRunner.run(FileInputRunner.java:153) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.util.Executors.process(Executors.java:67) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.spi.util.Executors.process(Executors.java:42) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:184) ~[embulk-core-0.8.18.jar:na]
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:180) ~[embulk-core-0.8.18.jar:na]
  at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_80]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_80]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_80]
  at java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]
java.lang.ArrayIndexOutOfBoundsException: 123456789
org.embulk.exec.PartialExecutionException: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at org.embulk.exec.BulkLoader$LoaderState.buildPartialExecuteException(BulkLoader.java:373)
  at org.embulk.exec.BulkLoader.doRun(BulkLoader.java:591)
  at org.embulk.exec.BulkLoader.access$000(BulkLoader.java:33)
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:389)
  at org.embulk.exec.BulkLoader$1.run(BulkLoader.java:385)
  at org.embulk.spi.Exec.doWith(Exec.java:25)
  at org.embulk.exec.BulkLoader.run(BulkLoader.java:385)
  at org.embulk.EmbulkEmbed.run(EmbulkEmbed.java:180)
  at com.treasuredata.worker.calls.EmbulkResult.run(EmbulkResult.java:69)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at com.treasuredata.java_call.JavaCall.processRequest(JavaCall.java:147)
  at com.treasuredata.java_call.JavaCall.run(JavaCall.java:111)
  at com.treasuredata.worker.JavaCallMain.main(JavaCallMain.java:21)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 123456789
  at java.util.Arrays$ArrayList.get(Arrays.java:2866)
  at org.embulk.spi.Page.getStringReference(Page.java:53)
  at org.embulk.spi.PageReader.getString(PageReader.java:111)
  at org.embulk.spi.PageReader.getString(PageReader.java:105)
  at org.embulk.base.restclient.record.SinglePageRecordReader.getString(SinglePageRecordReader.java:67)
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.stringColumn(JacksonAllInObjectScope.java:62)
  at org.embulk.spi.Column.visit(Column.java:58)
  at org.embulk.spi.Schema.visitColumns(Schema.java:81)
  at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36)
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
  at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
  at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
  at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
  at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
  at org.embulk.spi.PageBuilder.doFlush(PageBuilder.java:244)
  at org.embulk.spi.PageBuilder.flush(PageBuilder.java:250)
  at org.embulk.spi.PageBuilder.addRecord(PageBuilder.java:227)
  at org.embulk.parser.msgpack.MsgpackParserPlugin.run(MsgpackParserPlugin.java:280)
  at org.embulk.spi.FileInputRunner.run(FileInputRunner.java:153)
  at org.embulk.spi.util.Executors.process(Executors.java:67)
  at org.embulk.spi.util.Executors.process(Executors.java:42)
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:184)
  at org.embulk.exec.LocalExecutorPlugin$DirectExecutor$1.call(LocalExecutorPlugin.java:180)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
ylzen commented 7 years ago

@sakama Is this PR ready for review?

sakama commented 7 years ago

I don't have perfect confident and will test again.

sakama commented 7 years ago

@dmikurube At least, I can insert data as expected without any failures. Could you take a look?

muga commented 7 years ago

I just wanted to left comments:

dmikurube commented 7 years ago

@sakama That SinglePageRecordReader is really just forwarding method calls into PageReader. If this is a problem, I'm afraid we need to fix Embulk's PageReader instead. What do you think? Cc: @muga

muga commented 7 years ago

@dmikurube Agreed. https://github.com/embulk/embulk/pull/654

dmikurube commented 7 years ago

@muga lgtm'ed the PR on embulk/embulk.

@sakama If we still need an immediate workaround, I'd merge this when a comment is appended to describe it's just a workaround (with a link to an issue or this PR).

sakama commented 7 years ago

I checked https://github.com/embulk/embulk/pull/654 at my env. I got NullPointerException instead of ArrayIndexOutOfBoundsException. This should valid behavior and I think I can avoid by adding null check at JacksonAllInObjectScope.

I'll create another PR instead of this PR.

Caused by: java.lang.NullPointerException
    at org.embulk.spi.time.TimestampFormatter.format(TimestampFormatter.java:93)
    at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope$1.timestampColumn(JacksonAllInObjectScope.java:74)
    at org.embulk.spi.Column.visit(Column.java:60)
    at org.embulk.spi.Schema.visitColumns(Schema.java:81)
    at org.embulk.base.restclient.jackson.scope.JacksonAllInObjectScope.scopeObject(JacksonAllInObjectScope.java:36)
    at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:17)
    at org.embulk.base.restclient.jackson.scope.JacksonObjectScopeBase.scopeEmbulkValues(JacksonObjectScopeBase.java:9)
    at org.embulk.base.restclient.record.ValueExporter.exportValueToBuildRecord(ValueExporter.java:14)
    at org.embulk.base.restclient.record.RecordExporter.exportRecord(RecordExporter.java:18)
    at org.embulk.base.restclient.RestClientPageOutput.add(RestClientPageOutput.java:43)
    at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:394)
    at org.embulk.exec.LocalExecutorPlugin$ScatterTransactionalPageOutput$OutputWorker.call(LocalExecutorPlugin.java:319)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
    Suppressed: java.lang.NullPointerException
        ... 16 more
    Suppressed: java.lang.NullPointerException
        ... 16 more
    Suppressed: java.lang.NullPointerException
        ... 16 more
dmikurube commented 7 years ago

@sakama Ah, that sounds the right fix. Thanks!

muga commented 7 years ago

@sakama good catch 👍