aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
444 stars 148 forks source link

Converting TF2 MaskRCNN model to NeuronX on Inf2 instance fails #876

Closed saintarian closed 4 months ago

saintarian commented 5 months ago

I used a python script (pasted further down) to attempt to convert a MaskRCNN tensorflow model to NeuronX on an Inf2 instance. The full output of running the script is also pasted further down. The conversion seems to have failed based on 2 observations:

  1. Running the converted model and observing usage using neuron-top showed that the inferentia cores were not being utilized at all
  2. The following lines in the script output (full output of the script is pasted further down):
    2024-04-21 11:28:12.805365: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 227 ops of 36 different types in the graph that are not compiled by neuron-cc: Identity, ZerosLike, CropAndResize, RealDiv, Split, AddV2, Pad, Less, Maximum, ConcatV2, Unpack, Cast, GatherV2, Sub, Where, Sum, Tile, Transpose, ExpandDims, ResizeBilinear, Fill, NoOp, Mul, Minimum, Range, NonMaxSuppressionV5, Slice, Shape, Reshape, Placeholder, Select, Pack, Squeeze, TopKV2, Greater, GreaterEqual, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
    .
    .
    WARNING:tensorflow:neuron-cc failed with:
    2024-04-21T11:28:42Z [TEN404] (StatefulPartitionedCall/model/block17_1_conv/Conv2D_convolution.867) Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new
    .
    .
    WARNING:tensorflow:Warning: Your traced model has -228800.0% of operators compiled to neuron.

Python script:

import tensorflow as tf
import tensorflow_neuronx as tfnx
from PIL import Image
import numpy as np

model_path="saved_model"
loaded = tf.saved_model.load(model_path)
model = loaded.signatures["serving_default"]

img = Image.open("input/pic1.jpg").convert("RGB").resize((1024, 1024))
img_array = np.array(img).astype(np.float32)
input = np.expand_dims(img_array, 0)

print("Tracing model")
model_neuron = tfnx.trace(model, {"inputs": input})
print("Done tracing model")
model_neuron.save('./model-neuron')

Installed pip neuron packages:

neuron-cc==1.22.0.0+d4b4f5311
neuronx-cc==2.13.68.0+6dfecc895
tensorboard-plugin-neuronx==2.6.7.0
tensorflow-neuron==2.10.1.2.10.19.0
tensorflow-neuronx==2.10.1.2.1.0

Full output:

python convert_to_neuron.py
2024-04-21 11:27:46.235733: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:27:46.342874: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.342906: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2024-04-21 11:27:46.361251: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-04-21 11:27:46.852400: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.852458: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:46.852470: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-04-21 11:27:47.294231: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:/usr/lib
2024-04-21 11:27:47.294265: W tensorflow/stream_executor/cuda/cuda_driver.cc:263] failed call to cuInit: UNKNOWN ERROR (303)
2024-04-21 11:27:47.294289: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-6-59): /proc/driver/nvidia/version does not exist
2024-04-21 11:27:48.387283: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Tracing model
2024-04-21 11:28:04.569662: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:04.569996: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:11.285351: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:11.285490: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:12.805365: I tensorflow/neuron/grappler/convert/segment.cc:456] There are 227 ops of 36 different types in the graph that are not compiled by neuron-cc: Identity, ZerosLike, CropAndResize, RealDiv, Split, AddV2, Pad, Less, Maximum, ConcatV2, Unpack, Cast, GatherV2, Sub, Where, Sum, Tile, Transpose, ExpandDims, ResizeBilinear, Fill, NoOp, Mul, Minimum, Range, NonMaxSuppressionV5, Slice, Shape, Reshape, Placeholder, Select, Pack, Squeeze, TopKV2, Greater, GreaterEqual, (For more information see https://awsdocs-neuron.readthedocs-hosted.com/en/latest/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.html).
2024-04-21 11:28:23.797464: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:23.797598: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:24.884628: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:24.884761: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:25.227231: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:25.227320: I tensorflow/core/grappler/clusters/single_machine.cc:358]
2024-04-21 11:28:25.375183: I tensorflow/core/grappler/devices.cc:66] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2024-04-21 11:28:25.375337: I tensorflow/core/grappler/clusters/single_machine.cc:358] Starting new session
2024-04-21 11:28:25.876230: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:28:25.902884: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
.root = /usr/lib/python3.10/multiprocessing/process.py
root = /usr/lib/python3.10/multiprocessing
root = /usr/lib/python3.10
root = /usr/lib
root = /usr

WARNING:tensorflow:neuron-cc failed with:
2024-04-21T11:28:42Z [TEN404] (StatefulPartitionedCall/model/block17_1_conv/Conv2D_convolution.867) Internal tensorizer error - Please open a support ticket at https://github.com/aws-neuron/aws-neuron-sdk/issues/new

2024-04-21 11:28:43.122326: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:28:43.127727: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..........
Compiler status PASS
2024-04-21 11:31:47.977401: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:31:47.982348: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..
Compiler status PASS
2024-04-21 11:32:23.713727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-21 11:32:23.716551: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
..
Compiler status PASS
WARNING:tensorflow:Warning: Your traced model has -228800.0% of operators compiled to neuron.
Done tracing model
aws-rhsoln commented 5 months ago

Thank you for reporting the issue. The above logs indicate that the partitioner thought the model would be more efficient if it ran on CPU as compared to Neuron. Hence, all operators got partitioned to CPU. An optimized MaskRCNN support is not part of this year roadmap

saintarian commented 4 months ago

Thx. @aws-rhsoln. Closing the issue.