Converting TF2 MaskRCNN model to NeuronX on Inf2 instance fails

I used a python script (pasted further down) to attempt to convert a MaskRCNN tensorflow model to NeuronX on an Inf2 instance. The full output of running the script is also pasted further down. The conversion seems to have failed based on 2 observations:

Running the converted model and observing usage using neuron-top showed that the inferentia cores were not being utilized at all

The following lines in the script output (full output of the script is pasted further down):

2024-04-21 11:28:12.805365: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 227 ops of 36 different types in the graph that are not compiled by neuron-cc: Identity, ZerosLike, CropAndResize, RealDiv, Split, AddV2, Pad, Less, Maximum, ConcatV2, Unpack, Cast, GatherV2, Sub, Where, Sum, Tile, Transpose, ExpandDims, ResizeBilinear, Fill, NoOp, Mul, Minimum, Range, NonMaxSuppressionV5, Slice, Shape, Reshape, Placeholder, Select, Pack, Squeeze, TopKV2, Greater, GreaterEqual, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
.
.
WARNING:tensorflow:neuron-cc failed with:
2024-04-21T11:28:42Z [TEN404] (StatefulPartitionedCall/model/block17_1_conv/Conv2D_convolution.867) Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new
.
.
WARNING:tensorflow:Warning: Your traced model has -228800.0% of operators compiled to neuron.

Python script:

import tensorflow as tf
import tensorflow_neuronx as tfnx
from PIL import Image
import numpy as np

model_path="saved_model"
loaded = tf.saved_model.load(model_path)
model = loaded.signatures["serving_default"]

img = Image.open("input/pic1.jpg").convert("RGB").resize((1024, 1024))
img_array = np.array(img).astype(np.float32)
input = np.expand_dims(img_array, 0)

print("Tracing model")
model_neuron = tfnx.trace(model, {"inputs": input})
print("Done tracing model")
model_neuron.save('./model-neuron')

Installed pip neuron packages:

neuron-cc==1.22.0.0+d4b4f5311
neuronx-cc==2.13.68.0+6dfecc895
tensorboard-plugin-neuronx==2.6.7.0
tensorflow-neuron==2.10.1.2.10.19.0
tensorflow-neuronx==2.10.1.2.1.0

Full output:

python convert_to_neuron.py
2024-04-21 11:27:46.235733: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:27:46.342874: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.342906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-21 11:27:46.361251: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-21 11:27:46.852400: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.852458: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.852470: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-21 11:27:47.294231: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:47.294265: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2024-04-21 11:27:47.294289: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-6-59): /proc/driver/nvidia/version does not exist
2024-04-21 11:27:48.387283: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Tracing model
2024-04-21 11:28:04.569662: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:04.569996: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:11.285351: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:11.285490: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:12.805365: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 227 ops of 36 different types in the graph that are not compiled by neuron-cc: Identity, ZerosLike, CropAndResize, RealDiv, Split, AddV2, Pad, Less, Maximum, ConcatV2, Unpack, Cast, GatherV2, Sub, Where, Sum, Tile, Transpose, ExpandDims, ResizeBilinear, Fill, NoOp, Mul, Minimum, Range, NonMaxSuppressionV5, Slice, Shape, Reshape, Placeholder, Select, Pack, Squeeze, TopKV2, Greater, GreaterEqual, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
2024-04-21 11:28:23.797464: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:23.797598: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:24.884628: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:24.884761: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:25.227231: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:25.227320: I tensorflow/core/grappler/clusters/single_machine.cc:358]
2024-04-21 11:28:25.375183: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:25.375337: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:25.876230: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:28:25.902884: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
.root = /usr/lib/python3.10/multiprocessing/process.py
root = /usr/lib/python3.10/multiprocessing
root = /usr/lib/python3.10
root = /usr/lib
root = /usr

WARNING:tensorflow:neuron-cc failed with:
2024-04-21T11:28:42Z [TEN404] (StatefulPartitionedCall/model/block17_1_conv/Conv2D_convolution.867) Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new

2024-04-21 11:28:43.122326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:28:43.127727: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..........
Compiler status PASS
2024-04-21 11:31:47.977401: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:31:47.982348: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..
Compiler status PASS
2024-04-21 11:32:23.713727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:32:23.716551: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..
Compiler status PASS
WARNING:tensorflow:Warning: Your traced model has -228800.0% of operators compiled to neuron.
Done tracing model

aws-neuron / aws-neuron-sdk

Converting TF2 MaskRCNN model to NeuronX on Inf2 instance fails #876