GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Backport apache/beam #3669 #591

Closed tgroh closed 7 years ago

tgroh commented 7 years ago

Configure BZIP2 to read all "streams"

Without this, CompressionMode.BZIP2 only supports "standard" bz2 files containing a single stream. With this change, BZIP2 also supports bz2 files containing multiple streams, such as those produced by pbzip2.