Open raopku opened 7 years ago
Dl4j is embeddable so technically yes. Our spark integration is just a fancy map/reduce operation with some caching. Streaming could be similar. There is a current ticket out there for flink: http://issues.apache.org/jira/browse/FLINK-5782
I think DataVec support for Flink may be in progress, https://github.com/deeplearning4j/DataVec This may also help https://www.slideshare.net/jpatanooga/building-deep-learning-workflows-with-dl4j:
@pmalipio are you interested in this? We'd be more than glad to guide a contribution to the examples for as basic example or even a proper integration with training/batch inference like spark. We know this is a hard problem and understand that that might be a bit much though.
@agibsonccc Yes I am. I'd have to take a look to see how things were done for spark.
@pmalipio here's a nice entry point: https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/SparkDl4jMultiLayer.java alongside http://deeplearning4j.org/spark and http://deeplearning4j.org/distributed
The basic idea would be to map the same operations we did (parameter server and parameter averaging) and the off heap memory knobs on to flink.
The other challenge will be mapping datavec primitives on to the spark DSL. What you would have to do there is take a closer look at https://github.com/deeplearning4j/DataVec/blob/master/datavec-spark/src/main/java/org/datavec/spark/transform/SparkTransformExecutor.java as your entry point for datavec.
Most of the things in dl4j are actually very independent of the implementation in spark.
Everything from our memory management to the DSL for data processes are all independent. Happy to help break this for you if you have a chunk of this you'd like to attempt to gain an understanding of the process.
Probably I will not find spare time to do this. I think the most useful thing here is to support Flink DataStream API to apply DL to real time streams. Shouldn't this be as easy as processing each Flink Stream event (mapping) in the network?
I know DL4J can be used on spark However I am not sure can the DL4J be used on Spark Streaming or Flink? Thank You