Low utilization when running ResNet50

Hello, I have the following code to test a sample inference run using ResNet50:

from tensorflow.python.compiler.mlcompute import mlcompute
import tensorflow as tf
import tensorflow.keras as keras
from keras.applications import ResNet50
import numpy as np

tf.compat.v1.disable_eager_execution()

mlcompute.set_mlc_device(device_name='any')
# mlcompute.set_mlc_device(device_name='cpu')
# mlcompute.set_mlc_device(device_name='gpu')

BATCH_SIZE=128
data = np.zeros((1024, 224,224,3))
model = ResNet50()

result = model.predict(data, batch_size=BATCH_SIZE)
print(result.shape)

During the model.predict() GPU utilization averages at around 1/3rd and the CPU remains 80% idle. Is this expected performance? Is there something I should do to better increase utilization?

mlcompute.is_tf_compiled_with_apple_mlc() and mlcompute.is_apple_mlc_enabled() report 'True'

apple / tensorflow_macos

Low utilization when running ResNet50 #241