FRosner / spawncamping-dds

Data-Driven Spark allows quick data exploration based on Apache Spark.
Other
28 stars 15 forks source link

Allow multiple servers to subscribe to DDS core? #252

Closed FRosner closed 9 years ago

FRosner commented 9 years ago

Problem

Currently, the only way to look at results of DDS core functions is to use the SprayServer. It is automatically used when you start DDS. However, one might want to look at the data in a different way (e.g. write a file, or show it in a notebook - see #111 #190).

Proposed Solution

Instead of directly linking the DDS.start method with a SprayServer, we should redesign the way DDS is called. A possible way to do it is a publish-subscribe pattern. Subscribers (e.g. SprayServer) can then be added to the DDS core during runtime. Every time a DDS function is called (e.g. DDS.bar(df)), the resulting case class will be published to all subscribers.

There will always be a default "subscriber", which is the spark-shell. This guarantees that no results are lost even in case no subscriber is registered. It can be simply done by returning the result of the function instead of Unit, as it is currently done.

Subscriber Examples

Open Questions

FRosner commented 9 years ago

@Gerrrr @andypetrella do you think that with this change, we would be able to provide a spark-notebook subscriber? You could then subscribe the notebook to DDS in one of the cells and the subscriber implementation can do all the Chart and Point magic of the notebook?

Is it possible to implement something like this? What would this subscriber need to do in order to show the plot in the cell next to the DDS call? Is it even possible, because it cannot know which cell to put the stuff if it just listens to DDS, can it?

Relates to https://github.com/andypetrella/spark-notebook/issues/322

FRosner commented 9 years ago

@RPCMoritz could we perform a similar thing for Zeppelin?

FRosner commented 9 years ago

After discussing with @Gerrrr we came to the conclusion that we might not need have multiple subscribers. Especially because Spark-Notebook requires the result to be returned by the function in order to plot it in a cell. This cannot be accomplished with a subscriber model as the subscriber does not know in which cell the code has been called and the return type of the core functions would be Unit.

So another way would be to have a serve method that has a flexible return type (Any). You can then register a server (e.g. Spray server) that implements the serve method which does the job. In Spark-Notebook the serve method would need to transform the data (e.g. a bar chart) into a Chart object that knows how to render itself. The Spray server could transform it to JSON and push it to the JavaScript front-end.

FRosner commented 9 years ago

Based on the discussion I will create another issue with a proposal on how to rework DDS to make it more modular and a first prototype that works with Spark-Notebooks.

andypetrella commented 9 years ago

okay @FRosner gotcha, link here the new issue ^^

FRosner commented 9 years ago

@andypetrella https://github.com/FRosner/spawncamping-dds/issues/255 there you go :)