Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
855
stars
324
forks
source link
Decompressing bzip2 files with multiple "streams" only reads the first stream leading to data loss #596
Open
lukecwik opened 7 years ago
This is an issue found in Apache Beam (https://issues.apache.org/jira/browse/BEAM-2708) and has been found to impact Dataflow SDK for Java 1.6.0 to 1.9.0.
The fix has been backported with https://github.com/GoogleCloudPlatform/DataflowJavaSDK/pull/592