Open jornfranke opened 7 years ago
It seems that we can use HadoopInputFormatIO to read: https://beam.apache.org/documentation/sdks/javadoc/2.0.0/org/apache/beam/sdk/io/hadoop/inputformat/HadoopInputFormatIO.html It seems that we can use HDFSFileSink to write: https://beam.apache.org/documentation/sdks/javadoc/0.6.0/org/apache/beam/sdk/io/hdfs/HDFSFileSink.html
The new classes for reading are: https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/FileBasedSource.html for writing are: https://beam.apache.org/documentation/sdks/javadoc/2.1.0/org/apache/beam/sdk/io/FileBasedSink.html
Investigate support and create examples+unit tests for using HadoopOffice with Apache Beam (https://beam.apache.org/)
Apache Beam supports writing Big Data jobs once and run them on multiple platforms (e.g. Flink, Spark, Apex, Google Cloud Dataflow...)