GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Backport: [BEAM-1160] Fail to split in FileBasedSource if filePattern expands to empty #509

Closed swegner closed 7 years ago

swegner commented 7 years ago

Typically, input file patterns are validated during Pipeline construction, but standard Read transforms include an option to disable validation. This is generally useful but can lead to cases where a Pipeline executes successfully with empty inputs.

This changes the behavior to fail execution on empty file-based inputs even when validation is disabled.

swegner commented 7 years ago

Optimistic backport assuming https://github.com/apache/incubator-beam/pull/1621 is ready to go.

R: @dhalperi