apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[Discussion] Data Pipeline Intermediate Representation in MXNet/NNVM #8589

Open kevinthesun opened 7 years ago

kevinthesun commented 7 years ago

Tensorflow has a transform package https://github.com/tensorflow/transform which is capable of export a data preprocessing pipeline to a tensorflow graph, which can be incorporated into network graph. This package provides a neat way to manage data pipeline together with network graph and eliminates the gap of data preprocessing for training and inference(especially for serving application). Also I think we can get some performance improvement by using computation graph for data process rather than imperative processing for large data stream?

Currently in MXNet, if I want to do the similar thing, I need to pack the code(most time python script) directly with network graph files. This method has some issues:

  1. Potential security issue. If I wrote the processing codes and I am the only person use it, it's fine. However, if someone else wants to reuse it in their application, they need to check the code to make sure there is no security issue. It is not quite portable for reusing.

  2. It is bind to specific language. Usually it's easier to develop deep learning application using python, but if my production environment doesn't have python environment, I need to either setup python environment or rewrite this script with the language supported by my production environment.

Any thought about supporting data pipeline IR in MXNet/NNVM?

zhreshold commented 7 years ago

Data preprocessing operator is under dev, until then it's hard to pack data pipeline into graph.