NervanaSystems / ngraph

nGraph has moved to OpenVINO
https://www.ngraph.ai/
Apache License 2.0
1.36k stars 222 forks source link

BatchNormTrainingBackprop JSON error executing on nbench #2728

Closed speryt closed 5 years ago

speryt commented 5 years ago

When executing attached JSON file with nbench I am getting following error:

---- Benchmark ---- Exception caught on '/home/Share/BatchNormTraining.json' Error parsing json at node 'BatchNormTrainingBackprop_16'

Confirmed with HEAD on -b CPU and -b INTELGPU I'm running it on Ubuntu 18.04.

BatchNormTraining.zip

rkimballn1 commented 5 years ago

The serializer is designed ONLY to support unprocessed graphs. If you run passes over the graph then you can serialize it but you can't deserialize it because it does not know how to reconstruct the graph. The problem here is that BatchNormTrainingBackprop_16 gets two of its inputs from BatchNormTraining_5 but it does not know which of the three outputs to use. Normally there would be GetOutputElement ops after BatchNormTraining_5 and those are removed using a pass. This graph has been serialized after that pass has run.

dmyershov commented 5 years ago

I've faced with the same issue on processing Pad operation from initial graph.

I used cnn_tf_v1.13_compatible branch of https://github.com/tensorflow/benchmarks.git repository (inject "import ngraph_bridge" into tf_cnn_benchmarks.py). Note: Model option --freeze_when_forward_only=True was used to get inference graph.

Line to start forward pass and get json-s: $ NGRAPH_TF_BACKEND="CPU" NGRAPH_ENABLE_SERIALIZE_TRACING=1 python3 tf_cnn_benchmarks.py --forward_only=true --model=resnet50 --batch_size=1 --num_inter_threads=1 --data_name=imagenet --num_batches=100 --data_format=NCHW --display_every 1 --num_warmup_batches=0 --freeze_when_forward_only=True

Command to start nbench on initial graph: $ nbench -b CPU -f Function_57_000_N6ngraph4pass15LikeReplacementE.json

============================================================================ ---- Processing 'Function_57_000_N6ngraph4pass15LikeReplacementE.json'

---- Benchmark ---- Exception caught on 'Function_57_000_N6ngraph4pass15LikeReplacementE.json' Error parsing json at node 'Pad_25879'

rkimballn1 commented 5 years ago

The serializer is designed for using original, unmodified graphs. You can't run any passes on the graph or you may end up with a graph that can't be deserialized. Were any passes run on the graph prior to serializing it? How is the graph constructed? If you use the normal API to construct a graph then this should not happen.

rkimballn1 commented 5 years ago

The flag NGRAPH_ENABLE_SERIALIZE_TRACING is run post passes in the pass manager so the output is not readable by the serializer. That output is for debugging the passes and is run after each pass. That will not work for what you want to do.

You want to use the flag NGRAPH_ENABLE_SERIALIZE instead.

Let me know if this works for you so I can close the ticket

speryt commented 5 years ago

Thanks, this works for me. Indeed I have been using NGRAPH_ENABLE_SERIALIZE_TRACING previously. I don't know if this also works for @dmyershov