Askannz commented 2 years ago

Issue description

Compiling the model MobileNetV2 provided by Keras fails when --enable-fast-context-switch is enabled in the compiler options.

Reproduction

Simply take the example at https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/tensorflow_resnet50/resnet50.html but swap ResNet50 for MobilNetV2, and add --enable-fast-context-switch to the compiler options:

import os
import time
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn
import tensorflow.compat.v1.keras as keras
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2

# Create a workspace
WORKSPACE = './ws_mobilenet_v2'
os.makedirs(WORKSPACE, exist_ok=True)

# Prepare export directory (old one removed)
model_dir = os.path.join(WORKSPACE, 'mobilenet_v2')
compiled_model_dir = os.path.join(WORKSPACE, 'mobilenet_v2_neuron')
shutil.rmtree(model_dir, ignore_errors=True)
shutil.rmtree(compiled_model_dir, ignore_errors=True)

# Instantiate Keras ResNet50 model
keras.backend.set_learning_phase(0)
keras.backend.set_image_data_format('channels_last')

model = MobileNetV2(weights=None)

# Export SavedModel
tf.saved_model.simple_save(
    session            = keras.backend.get_session(),
    export_dir         = model_dir,
    inputs             = {'input': model.inputs[0]},
    outputs            = {'output': model.outputs[0]})

# Compile using Neuron
tfn.saved_model.compile(
    model_dir, compiled_model_dir,
        compiler_args=[
        '--enable-fast-context-switch'
    ]
)

Logs

WARNING:tensorflow:From /opt/seemode-env/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From repro2.py:27: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

2021-10-29 01:47:38.381785: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-10-29 01:47:38.397652: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3800090000 Hz
2021-10-29 01:47:38.398016: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560626161b40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-10-29 01:47:38.398028: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From repro2.py:30: simple_save (from tensorflow.python.saved_model.simple_save) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.simple_save.
WARNING:tensorflow:From /opt/seemode-env/lib/python3.7/site-packages/tensorflow_core/python/saved_model/signature_def_utils_impl.py:201: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
2021-10-29 01:47:40.034486: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2021-10-29 01:47:40.034566: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-10-29 01:47:40.062439: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 3 ops of 2 different types in the graph that are not compiled by neuron-cc: Placeholder, NoOp, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
2021-10-29 01:47:40.109133: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2021-10-29 01:47:40.109157: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   aws_neuron_static_shape_inference: Graph size after: 689 nodes (0), 698 edges (0), time = 4.779ms.
2021-10-29 01:47:40.109163: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   aws_neuron_fuse_supported_operators: Graph size after: 3 nodes (-686), 2 edges (-696), time = 48.612ms.
INFO:tensorflow:fusing subgraph {subgraph neuron_op_b67fa0a894269be4 with input tensors ["<tf.Tensor 'input_10/_0:0' shape=(1, 224, 224, 3) dtype=float32>"], output tensors ["<tf.Tensor 'Logits/Softmax:0' shape=(1, 1000) dtype=float32>"]} with neuron-cc
.......Estimated required event is 581 > available 220
....................10/29/2021 01:56:39 AM INFO [Stargazer]: Generating Arch 'Inferentia-1.0'
10/29/2021 01:56:39 AM INFO [Stargazer]: INFO: Pre SG DRAM bytes loaded or saved 45994912
10/29/2021 01:56:39 AM INFO [Stargazer]: INFO: Pre SG average DMA size 998 bytes
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Loads in Func = 23088
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Saves in Func = 5962
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Input Loads in Func= 6939
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Output Saves in Func= 1
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Spill Loads in Func= 16149
10/29/2021 01:56:39 AM INFO [Stargazer]: Num Spill Saves in Func= 5961
.10/29/2021 01:56:42 AM INFO [Stargazer]: Wavegraph code generation for Inferentia:
10/29/2021 01:56:42 AM INFO [Stargazer]:     Engine              File
10/29/2021 01:56:42 AM INFO [Stargazer]:     ------              ----
10/29/2021 01:56:42 AM INFO [Stargazer]:     PE-Array            pe.bin
10/29/2021 01:56:42 AM INFO [Stargazer]:     Pool-Eng            pool.bin
10/29/2021 01:56:42 AM INFO [Stargazer]:     Act-Eng             act.bin
10/29/2021 01:56:42 AM INFO [Stargazer]: 
10/29/2021 01:56:42 AM INFO [Stargazer]: Fixing data race is 0
10/29/2021 01:56:43 AM INFO [Stargazer]: Data race checker engines
10/29/2021 01:56:43 AM INFO [Stargazer]: [Sailfish] Data race analysis initially
10/29/2021 01:56:44 AM INFO [Stargazer]: [Sailfish] Data race analysis found no races, run time: 0:00:01
10/29/2021 01:56:44 AM INFO [Stargazer]: [Sailfish] Remove redundant edges
10/29/2021 01:56:45 AM INFO [Stargazer]: Data race checker engines
10/29/2021 01:56:45 AM INFO [Stargazer]: Transitive reduction start 
10/29/2021 01:56:45 AM INFO [Stargazer]: Transitive reduction removed 10965 redundant edges, time: 0:00:00
10/29/2021 01:56:45 AM INFO [Stargazer]: Sync Critical Load Chains Start
10/29/2021 01:56:45 AM DEBUG [Stargazer]: SyncCritLoads buildLoadGraph Start...
10/29/2021 01:56:45 AM DEBUG [Stargazer]: SyncCritLoads buildLoadGraph Done.
10/29/2021 01:56:45 AM DEBUG [Stargazer]: Load Graph NumRoots; 932
10/29/2021 01:56:45 AM INFO [Stargazer]: Sync Critical Load Chains added 2 new Load-2-Load syncs
10/29/2021 01:56:45 AM INFO [Stargazer]: Sync Critical Load Chains Done.0:00:00

Followed by many lines similar to

10/29/2021 01:56:50 AM WARNING [Stargazer]: SBUF DMA write size != 0 mod 4: SBUF address=0x64136, size=18

And then

10/29/2021 01:56:51 AM INFO [Stargazer]: Out wavegraph bin file is wavegraph-bin.json
10/29/2021 01:56:51 AM INFO [Stargazer]: Writing NN JSON to file 'wavegraph-bin.json'
10/29/2021 01:56:57 AM INFO [Stargazer]: Virtual memory peak = 7142268 K bytes
10/29/2021 01:56:57 AM INFO [Stargazer]: PASSED - Total time: 0:00:17
..
Compiler status ERROR
WARNING:tensorflow:[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 32, 1]  [1, 112, 112, 32]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 96, 1]  [1, 113, 113, 96]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 144, 1]  [1, 56, 56, 144]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 144, 1]  [1, 57, 57, 144]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 192, 1]  [1, 28, 28, 192]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 192, 1]  [1, 28, 28, 192]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 192, 1]  [1, 29, 29, 192]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 384, 1]  [1, 14, 14, 384]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 384, 1]  [1, 14, 14, 384]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 384, 1]  [1, 14, 14, 384]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 384, 1]  [1, 14, 14, 384]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 576, 1]  [1, 14, 14, 576]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 576, 1]  [1, 14, 14, 576]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 576, 1]  [1, 15, 15, 576]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 960, 1]  [1, 7, 7, 960]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 960, 1]  [1, 7, 7, 960]
[01:47:41] /opt/brazil-pkg-cache/packages/DmlcTvm/DmlcTvm-1.7.2.0/AL2_x86_64/generic-flavor/src/src/relay/pass/do_replication.cc:66: Replication is not applied to group convolution. Kernel and data input channels do not match[3, 3, 960, 1]  [1, 7, 7, 960]
Analyzing dependencies of sg00/Block1
0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
Warning: scheduling level for function downgraded to 1.
/opt/seemode-env/bin/neuron-cc:8: DeprecationWarning: np.asscalar(a) is deprecated since NumPy v1.16, use a.item() instead
  sys.exit(main())
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: ***************************************************************
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:  An Internal Compiler Error has occurred
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: ***************************************************************
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: 
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Error message:  Duplicate waveop name Remat_expanded_conv_depthwise/depthwise-t5283_i82185!
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: 
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Error class:    AssertionError
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Error location: Unknown
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Command line:   /opt/seemode-env/bin/neuron-cc compile /tmp/tmp6oyuf3db/neuron_op_b67fa0a894269be4/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp6oyuf3db/neuron_op_b67fa0a894269be4/graph_def.neff --io-config '{"inputs": {"input_10/_0:0": [[1, 224, 224, 3], "float32"]}, "outputs": ["Logits/Softmax:0"]}' --enable-fast-context-switch --verbose=35
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: 
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Internal details:
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/CommandDriver.py", line 223, in neuroncc.driver.CommandDriver.CommandDriver.run
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/commands/CompileCommand.py", line 485, in neuroncc.driver.commands.CompileCommand.CompileCommand.run
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/Job.py", line 289, in neuroncc.driver.Job.SingleInputJob.run
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/jobs/DataflowChecker.py", line 21, in neuroncc.driver.jobs.DataflowChecker.DataflowChecker.runSingleInput
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/driver/jobs/DataflowChecker.py", line 23, in neuroncc.driver.jobs.DataflowChecker.DataflowChecker.runSingleInput
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/starfish/wavegraph/WavegraphIR.py", line 44, in neuroncc.starfish.wavegraph.WavegraphIR.WavegraphIr.__init__
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/starfish/wavegraph/WavegraphIR.py", line 83, in neuroncc.starfish.wavegraph.WavegraphIR.WavegraphIr.__fromJSON
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   File "neuroncc/starfish/wavegraph/WavegraphIR.py", line 48, in neuroncc.starfish.wavegraph.WavegraphIR.WavegraphIr.addOp
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: 
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Version information:
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   Neuron Compiler version 1.7.3.0+c02f9fde4
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   HWM version 1.7.0.0-0
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   NEFF version Dynamic
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   TVM version 1.7.2.0+0
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   NumPy version 1.18.5
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   MXNet not available
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]:   TF not available
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: 
10/29/2021 01:57:26 AM ERROR 126 [neuron-cc]: Artifacts stored in: /tmp/tmp6oyuf3db/neuron_op_b67fa0a894269be4

INFO:tensorflow:Number of operations in TensorFlow session: 2993
INFO:tensorflow:Number of operations after tf.neuron optimizations: 689
INFO:tensorflow:Number of operations placed on Neuron runtime: 0
WARNING:tensorflow:Converted ./ws_mobilenet_v2/mobilenet_v2 to ./ws_mobilenet_v2/mobilenet_v2_neuron but no operator will be running on AWS machine learning accelerators. This is probably not what you want. Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)!

Full log: https://gist.github.com/Askannz/b6b1311cc8921e04e1179e2e64b2182b

Environment

System:
Ubuntu 18.04
python 3.7
aws-neuron-tools 2.0.277.0

Pip:
neuron-cc 1.7.3.0+c02f9fde4
tensorflow-neuron 1.15.5.2.0.3.0
tensorflow 1.15.5

awsrjh commented 2 years ago

is this environment working for you without the --enable-fast-context-switch?

Askannz commented 2 years ago

Yes, that same code in the same environment works when the switch is removed.

EDIT: to be more accurate, without that switch there is still a lot of spam in the log (lots of SBUF DMA message) but the compilation completes without errors: https://gist.github.com/Askannz/7021e0a8495086fd2e40f6e36d4aaa97

awsrjh commented 2 years ago

ok thank you for the report. We will add it to our backlog.

If the compilation works for you without the switch, then please continue to use that in the mean time.

aws-taylor commented 2 years ago

Hello @Askannz,

Just an update - we have identified the issue and we expect it to be resolved with the next release of our compiler. In the mean time, if you are willing and able to share an example test case we can verify on our side.

-Taylor

shebbur-aws commented 2 years ago

Hello @Askannz,

We have a new release yesterday and we expect your issue to be resolved with that. Please verify the fix and reopen the ticket if otherwise.

Thanks, Shruthi.

aws-neuron / aws-neuron-sdk

[TensorFlow 1.15] MobileNetV2 doesn't compile with --enable-fast-context-switch #348

Issue description

Reproduction

Logs

Environment