Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
855
stars
323
forks
source link
Use a NavigableSet Instead of a PriorityQueue in WatermarkManager #475
Closed
tgroh closed 8 years ago
This removes an O(n) call to remove, replacing it with an O(log(n)) call. This significantly improves scaling behavior of the DirectRunner
Backports https://github.com/apache/incubator-beam/pull/1202