[Discussion] Data Pipeline Intermediate Representation in MXNet/NNVM

Tensorflow has a transform package https://github.com/tensorflow/transform which is capable of export a data preprocessing pipeline to a tensorflow graph, which can be incorporated into network graph. This package provides a neat way to manage data pipeline together with network graph and eliminates the gap of data preprocessing for training and inference(especially for serving application). Also I think we can get some performance improvement by using computation graph for data process rather than imperative processing for large data stream?

Currently in MXNet, if I want to do the similar thing, I need to pack the code(most time python script) directly with network graph files. This method has some issues:

Potential security issue. If I wrote the processing codes and I am the only person use it, it's fine. However, if someone else wants to reuse it in their application, they need to check the code to make sure there is no security issue. It is not quite portable for reusing.
It is bind to specific language. Usually it's easier to develop deep learning application using python, but if my production environment doesn't have python environment, I need to either setup python environment or rewrite this script with the language supported by my production environment.

Any thought about supporting data pipeline IR in MXNet/NNVM?

apache / mxnet

[Discussion] Data Pipeline Intermediate Representation in MXNet/NNVM #8589