embulk / embulk-base-restclient

Base class library for Embulk plugins to access RESTful services
https://www.embulk.org/
Apache License 2.0
6 stars 7 forks source link

Change the place where finish() is called in RestClientInputPluginBaseUnsafe. #71

Closed muga closed 7 years ago

muga commented 7 years ago

PageBuilder#finish() is called when ingestServiceData failed. The finish() needs to be called when PageBuilder finishes writing Page objects.

As background, when PreviewExecutor executes my plugin that uses this library, it throws NullPointerException reported on https://github.com/embulk/embulk/issues/570. The cause of the issue is that PageBuilder#finish method is called twice within preview. I'm trying to fix Embulk itself to avoid this situation https://github.com/embulk/embulk/pull/571. This PR fixes the cause that the finish() is called twice within PreviewExecutor.

dmikurube commented 7 years ago

@muga Some questions:

  1. Is this fix required only for Preview?
  2. What impacts can occur in non-Preview executions by this change?
  3. I'm still not sure why calling finish() there can finally make finish() called twice. It's from how PreviewExecutor works?
muga commented 7 years ago

For 1, this is required for both of preview and run. finish() is called when the input plugin correctly finishes operations for Page objects like addRecord(). The addRecord method is called when the plugin correctly finishes writing rows to the Page object.

For 2, this doesn't have any impacts for run and guess. PreviewExecutor only has the logic that uses the spec of finish(). Other executors don't have it for now but they might be able to to have it.

For 3, This is the scenario that finish() is called twice:

  1. PreviewExecutor creates SamplingPageOutput object and passes it to input plugin (PreviewExecutor.java#L122)
  2. call finish() when Input plugin finishes writing data to Page object. (PreviewExecutor.java#L170)
  3. throw PreviewedNoticeError within the finish() method. (PreviewExecutor.java#L182)
  4. The PreviewedNoticeError is thrown in ingestServiceData method and then, 2nd finish() is called within finally block. (RestClientInputPluginBaseUnsafe.java#L118)
dmikurube commented 7 years ago

@muga Sorry, I don't understand yet. That finish() in your step 2 is PageOutput#finish, and the other finish() in your step 4 is PageBuilder#finish. They're different?

muga commented 7 years ago

@dmikurube PageBuilder#finish calls PageOutput#finish. Sorry, I should've commented it as 5.

dmikurube commented 7 years ago

@muga Okay. Then, sounds it is very exceptional in PreviewExecutor that SamplingPageOutput#add calls its own finish by itself. I thought that PageBuidler#finish ==> PageOutput#finish is designed to be called from outside at last. Other plugins look like that. PreviewExecutor seems an exception.

It seems to me that the right fix is to change how to escape from preview in PreviewExecutor when the specified number of rows have been read. What do you think?

The library is designed under the assumption above that finish is designed to be called from outside.

dmikurube commented 7 years ago

@muga I'm just wondering if you could add a comment there such as

// When failing around |PageBuidler| in |ingestServiceData|, |pageBuilder.finish()| should not be called.

(No need to be as-is.)

muga commented 7 years ago

@dmikurube Thank you for reviewing.

It seems to me that the right fix is to change how to escape from preview in PreviewExecutor when the specified number of rows have been read. What do you think?

Agree. As talked on 1on1, we should fix 2 things:

  1. Fix PageOutput#add in SamplingPageOutput and not call the finish.
  2. Fix RestClientInputPluginBaseUnsafe#run and not call pageBuilder.finish when failing ingestServiceData method call. (this PR)

I will file a ticket for 1.