Closed ferranmartinezlleida closed 3 years ago
Hi erranmartinezlleida, thank you for raising this. We are looking into it.
@awsryg Thank you, I appreciate that
Hi erranmartinezlleida, can you please try saving the model in TF 1.x format rather than 2.0?
@awsryg Could be that, I saved it in TF 2.x. I'll try it and I'll let you know the result.
Hi @awsryg, I tried compiling a model downloaded from tensorhub that was created with tf1 and still got problems, this time different though:
ValueError: batch_size is not sufficient to determine the shape of input tensor Tensor("hub_input/image_tensor:0", shape=(1, ?, ?, 3), dtype=float32)
This is the model used: https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1
But I was able to overcome it by modifiying the compiling script:
import os
import time
import shutil
import tensorflow as tf
import tensorflow.neuron as tfn
import tensorflow.compat.v1.keras as keras
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input
WORKSPACE = './ws_yolov4'
os.makedirs(WORKSPACE, exist_ok=True)
model_dir = os.path.join(WORKSPACE, 'saved_model-t1')
compiled_model_dir = os.path.join(WORKSPACE, 'saved_model_neuron')
keras.backend.set_learning_phase(0)
keras.backend.set_image_data_format('channels_last')
model_saved_dir = "./ws_yolov4/d_saved_model-t1"
with tf.Session() as sess:
_ = tf.compat.v1.saved_model.load(sess,tags=[],export_dir="./ws_yolov4/saved_model-t1")
zeros_input = tf.compat.v1.keras.initializers.Zeros(dtype=tf.dtypes.float32)(shape=(1,224,224,3))
zeros_output = tf.compat.v1.keras.initializers.Zeros(dtype=tf.dtypes.float32)(shape=(1,1000))
tf.saved_model.simple_save(
session = sess,
export_dir = model_saved_dir,
inputs = {'input': zeros_input},
outputs = {'output': zeros_output})
tfn.saved_model.compile(model_saved_dir, compiled_model_dir)
shutil.make_archive('./saved_model-neuron', 'zip', WORKSPACE, 'saved_model_neuron')
Finally I managed to compile it! But the operators of the model are not supported yet by neuron-sdk
WARNING:tensorflow:Converted ./ws_yolov4/d_saved_model-t1 to ./ws_yolov4/saved_model_neuron but no operator will be running on AWS machine learning accelerators. This is probably not what you want. Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)!
I will try to save my custom model with this format and correct operators and see what happens.
One more thing. During the process I also tried to compile another model :
https://tfhub.dev/google/object_detection/mobile_object_localizer_v1/1
and got this other error:
ValueError: Node 'Postprocessor/BatchMultiClassNonMaxSuppression/map/TensorArrayUnstack_1/TensorArrayScatter/TensorArrayScatterV3' expects to be colocated with unknown node 'Postprocessor/raw_box_scores'
So I guess the way you save the model it is very important for the sdk to be able to compile it.
@ferranmartinezlleida The TensorArrayScatterV3 operator is not supported by neuron-sdk (runs on in framework / on cpu). You can see the list of supported operators by neuron-cc list-operators --framework TENSORFLOW
However other parts of the model should have been accelerated (they are not when compilation fails and execution thus falls back to framework). Can you share your saved model and compilation log file?
Hi ferranmartinezlleida, I have reproduced the issue with https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1 and https://tfhub.dev/google/object_detection/mobile_object_localizer_v1/1 and we are investigating.
Thank you very much! @jeffhataws I'll be looking at this issue to see your updates. @aws-zejdaj I'm sorry I can't share the model unfortunately, moreover I've been assigned at investigate other things so I'm not operating at full at this issue atm.
Hi @ferranmartinezlleida - we are continuing to work on this issue and will provide an update when we have made more progress
Hi @ferranmartinezlleida,
As a follow on, the team is working on Object Detection models, and you can find object detection models on our roadmap here: https://github.com/aws/aws-neuron-sdk/projects/2. In particular we are working on Faster RCNN may be of interest since it is related to https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1. Work on this is still in progress.
I’m assuming that our YOLOv4 example did not meet your needs?
No, I couldn't seem to put my custom model through it. I don't know if I did something wrong, I'm not a full expert on tensorflow, I was just exploring options for a project at my company. What I can assure you tho is that the model worked. I was able to perform correct inferences on python.
Hello @ferranmartinezlleida,
I understand. In that case, I'd suggest you 'Watch' our associated roadmap item for Faster RCNN - https://github.com/aws/aws-neuron-sdk/issues/153 as it progresses. If you are able to give some more specifics related to the error messages you saw related to YOLOv4 we would also be happy to assist with those.
Regards, Taylor
@ferranmartinezlleida Please reopen if still running into the issue and provide us the specific error and testcase.
Hello, thanks for all your nice tutorials on the sdk. I've followed quite some without any problems (the ones from the docker part and ResNet50) but when I tried to follow yolov4 neuron compilation tutorial here: https://github.com/aws/aws-neuron-sdk/blob/master/src/examples/tensorflow/yolo_v4_demo/evaluate.ipynb, I've encountered an error I'm not able to solve.
I've tried to use this script, with some modifications regarding the directories to allocate my custom model:
When I execute the script I get the following error:
ValueError: Input 1 of node StatefulPartitionedCall was passed float from conv2d/kernel:0 incompatible with expected resource.
I tried also modifiying the compile_resnet50 in order to put my custom model through but I get the same error:
What I can assure is that the .pb model at yolov4-416 works, I've been able to do detections. Caracteristics from the model are:
From the enviroment part, I'm using this Ubuntu 18.04 on a inf1.xlarge machine with all the steps done and verified described here: https://github.com/aws/aws-neuron-sdk/blob/master/docs/neuron-install-guide.md
Maybe I'm doing something wrong when I load the model previous to the compilation. Could you give me some directions? Maybe I have to do some previous steps with my model before I start the compilation, or maybe I'm loading it wrong. Thank you!
The full error trace for the first piece of code:
The error trace for the second piece of code: