GoogleCloudPlatform / DataflowJavaSDK

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
http://cloud.google.com/dataflow
855 stars 324 forks source link

Can Dining Philosophers problem be implemented in DtataflowJavaSDK? #40

Closed akaigoro closed 9 years ago

akaigoro commented 9 years ago

subj

davorbonaci commented 9 years ago

Sure, the dining philosophers issue, i.e., a deadlock or a livelock, can occur in any system where multiple workers are accessing shared resources concurrently. For example, a pipeline containing a single DoFn that may acquire two locks in a random order could run into this issue. The "locks" are assumed to be global to the Dataflow worker VMs.

akaigoro commented 9 years ago

I mean, is it possible to write correct implementation using DtataflowJavaSDK?

davorbonaci commented 9 years ago

Sure, there are several approaches that can result in "correct implementations". Resource hierarchy solution requires acquiring locks in a partial order and doesn't require any particular support from the system itself.

Other solutions may require some additional work. For example, arbitrator solution could be implemented by having Datataflow workers communicating with a global arbitrator, which can be built as a service outside Dataflow. Solutions which require communication between workers, like Chandy/Misra solution, are harder.

In principle, however, Dataflow programming model is higher layer and very different than a plain message-passing system where users coordinate work between workers. In Dataflow, workers shouldn't be talking to each other. Please check-out our programming model documentation. Perhaps it would be beneficial to focus on the underlying data-processing problem you would like to solve with Dataflow and how to express it using Dataflow principles, instead of an implementation detail of deadlocks during execution.