GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 323 forks source link

Use a NavigableSet Instead of a PriorityQueue in WatermarkManager #475

Closed tgroh closed 8 years ago

tgroh commented 8 years ago

This removes an O(n) call to remove, replacing it with an O(log(n)) call. This significantly improves scaling behavior of the DirectRunner

Backports https://github.com/apache/incubator-beam/pull/1202

kennknowles commented 8 years ago

LGTM. Feel free to self-merge after fixing checkstyle. Might just want to get Travis to pass.