deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
http://deeplearning4j.konduit.ai
Apache License 2.0
13.7k stars 3.84k forks source link

Add example of spark streaming and flink streaming inference with dl4j #3071

Open raopku opened 7 years ago

raopku commented 7 years ago

I know DL4J can be used on spark However I am not sure can the DL4J be used on Spark Streaming or Flink? Thank You

agibsonccc commented 7 years ago

Dl4j is embeddable so technically yes. Our spark integration is just a fancy map/reduce operation with some caching. Streaming could be similar. There is a current ticket out there for flink: http://issues.apache.org/jira/browse/FLINK-5782

pmalipio commented 6 years ago

I think DataVec support for Flink may be in progress, https://github.com/deeplearning4j/DataVec This may also help https://www.slideshare.net/jpatanooga/building-deep-learning-workflows-with-dl4j:

agibsonccc commented 6 years ago

@pmalipio are you interested in this? We'd be more than glad to guide a contribution to the examples for as basic example or even a proper integration with training/batch inference like spark. We know this is a hard problem and understand that that might be a bit much though.

pmalipio commented 6 years ago

@agibsonccc Yes I am. I'd have to take a look to see how things were done for spark.

agibsonccc commented 6 years ago

@pmalipio here's a nice entry point: https://github.com/deeplearning4j/deeplearning4j/blob/master/deeplearning4j-scaleout/spark/dl4j-spark/src/main/java/org/deeplearning4j/spark/impl/multilayer/SparkDl4jMultiLayer.java alongside http://deeplearning4j.org/spark and http://deeplearning4j.org/distributed

The basic idea would be to map the same operations we did (parameter server and parameter averaging) and the off heap memory knobs on to flink.

The other challenge will be mapping datavec primitives on to the spark DSL. What you would have to do there is take a closer look at https://github.com/deeplearning4j/DataVec/blob/master/datavec-spark/src/main/java/org/datavec/spark/transform/SparkTransformExecutor.java as your entry point for datavec.

Most of the things in dl4j are actually very independent of the implementation in spark.

Everything from our memory management to the DSL for data processes are all independent. Happy to help break this for you if you have a chunk of this you'd like to attempt to gain an understanding of the process.

pmalipio commented 6 years ago

Probably I will not find spare time to do this. I think the most useful thing here is to support Flink DataStream API to apply DL to real time streams. Shouldn't this be as easy as processing each Flink Stream event (mapping) in the network?