aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
9.92k stars 6.71k forks source link

SageMaker Neo Object Detection Compilation Job Failed #1284

Open shitijkarsolia opened 4 years ago

shitijkarsolia commented 4 years ago

I have a custom trained object detection network to detect the License Plates of Vehicles which I am trying to optimize for deployment on Nvidia Jetson Xavier by performing Neo Compilation. The framework used is Keras. I have compressed the model.h5 file which contains the weights & architecture into model.tar.gz and uploaded it to S3.

I have run multiple compilation jobs but all have failed stating the following message: "ServerError: There was an internal server error during the compilation job, please try again in a few minutes or contact amazon-neo-feedback@amazon.com to get support for this compilation job."

I was hoping to get answers to few of my questions:

Q1) Does Sagemaker Neo support Object Detection models? Q2) If yes, then does it support compilation of pre-trained OD models ?

My data input is of the shape: (1, None, None, 3) where a "None" dimension signifies that it can be any scalar number. I have tried giving SageMaker Neo's Data input configuration as {"input":[1,3,null,null]}, {"input":[1,3,224,224]} however it still fails with the same error mentioned above.

Q3) What is the correct way to input the input configuration for OD models where the width,height dimensions can accept any values?

wuchih-amazon commented 4 years ago

Hi @shitijkarsolia, yes to all three questions. The input format is correct as well. There was a server side related issue recently. Would you mind try the compilation job again? In addition, for any problem associate with compilation failure should returns a client error instead. If you still encounter problem, please file a ticket with your associated job ARN. Since this is an open forum, I'd suggest that you refrain from publicly disclose more information.

shitijkarsolia commented 4 years ago

Hey @wuchih-amazon, I tried to compile again but the job failed again with the same ServerError message as before. I'll try to contact AWS support. Thanks!

UTkzhang commented 3 years ago

Has there been any updates on this issue? @shitijkarsolia

shitijkarsolia commented 3 years ago

@UTkzhang There have been many upgrades to Neo since this issue. To answer my original questions :

Ans1&2: Neo doesn't support compilation of Object Detection models for all frameworks currently (Ex. Keras & TF). It does support TFlite, MXNet & the newly added Darknet framework. This is a great upgrade since the YOLO based OD models are trained using Darknet. This link can provide you with more detail about the frameworks and their support. I haven't personally tested this out yet.

Ans3: In my last communication with the AWS SageMaker Neo team, they mentioned that they had added the ability to accept dynamic input shapes for model compilation. I'm not fully certain if the feature has been added or is still in pipeline since there is currently no documentation referring to the exact way this can be done.

kbordac commented 3 years ago

@shitijkarsolia according to this https://aws.amazon.com/blogs/machine-learning/model-dynamism-support-in-amazon-sagemaker-neo/ looks like Neo supports object detection for TF. it is confusing both links are from AWS with different information

shitijkarsolia commented 3 years ago

@kbordac Thanks for sharing. Seems like AWS just recently released the support for neo-compilation of Tensorflow object detection models. The link I shared is outdated.