aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
421 stars 136 forks source link

Inferentia performance is not desirable #824

Open mostafafarzaneh opened 5 months ago

mostafafarzaneh commented 5 months ago

I recently migrated from Elastic Inference to Inferentia for inference, expecting improved performance. However, the results are not as promising as anticipated. I ran 1000 predictions on different instances, including CPU, EI (Elastic Inference), Inf1, and Inf2. Here are the observed inference times for my model:

eia2.large: 0.2506 seconds Inf1.xlarge: 0.5792 seconds Inf2.xlarge: 0.3587 seconds CPU (6 cores): 0.5817 seconds

As discussed in #821, I encountered challenges in compiling custom signatures for Inf1 and Inf2. Due to limitations, I can only compile the model and then apply a custom signature(Does anyone know if it is possible?). This might be affecting the overall performance, but I am uncertain if this is the root cause.

jluntamazon commented 5 months ago

Are there any more details you could provide about the underlying model?

If there is an open source equivalent that we could look at, this could help us diagnose if there is a problem with the operator support. Another useful artifact would be the compilation log output.

One thing to check is to make sure that you are using the correct framework on each instance. A model compiled for inf1 is not compatible with inf2:

Lastly, another way you could check for operator support is to use the analyze_model function:

mostafafarzaneh commented 5 months ago

Thank you @jluntamazon

I appreciate your guidance. Here's the additional information and outputs you requested:

Our U-net like model is designed with specific convolutional layers, normalization techniques, and custom blocks. It follows a U-net structure with separable convolutional layers, normalization, and residual blocks.

Also, as discussed in https://github.com/aws-neuron/aws-neuron-sdk/issues/821, I encountered challenges in compiling custom signatures for Inf1 and Inf2. Due to limitations, I can only compile the model and then apply a custom signature(Do you know if it is possible?). This might be affecting the overall performance, but I am uncertain if this is the root cause.

You can find the compilation and analysis output attached.

analize.txt compile.txt

Also, here is my compile code in case you are interested:

import os
import numpy as np
#import tensorflow.neuron as tfn
import tensorflow_neuronx as tfnx
import cv2

import syspath
import argparse
parser = argparse.ArgumentParser(description='Export model for inference')
parser.add_argument("--input", type=str, help='path to model hdf5 file')
parser.add_argument("--output", type=str, help='export path')
args = parser.parse_args()

import tensorflow as tf
from tensorflow_addons.layers import (
    GroupNormalization,
    InstanceNormalization,
)

def load_image(image_path):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image_array = image.astype(np.float32)
    image_array /= 255.0
    image_array = np.expand_dims(image_array, axis=0)
    example_input = tf.constant(image_array)
    return example_input

model = tf.keras.models.load_model(args.input, custom_objects={
    'tversky_loss': None,
    'loss_fn': None,
    'ssim': None
})

sample_input = load_image("we_01.jpg")
model_neuron = tfnx.trace(model, sample_input)
model_neuron.save(args.output)

#results = tfnx.analyze_model(model, sample_input)
#print(results)