GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Fix HadoopFileSource’s split size estimate #534

Closed igorbernstein2 closed 7 years ago

igorbernstein2 commented 7 years ago

Fixed handling of InterruptedException. However I don't think this fix applies to beam. I don't think HadoopFileSource ever made it to apache beam.

dhalperi commented 7 years ago

It got renamed to HDFSFileSource: https://github.com/apache/beam/blob/master/sdks/java/io/hdfs/src/main/java/org/apache/beam/sdk/io/hdfs/HDFSFileSource.java

igorbernstein2 commented 7 years ago

Thanks for pointing it out. Will submit a PR there as well.