awslabs / multi-model-server

Multi Model Server is a tool for serving neural net models for inference
Apache License 2.0
998 stars 231 forks source link

bad inference prediction results from ResNet50 ONNX models #847

Open tahouse opened 5 years ago

tahouse commented 5 years ago

I'm getting bad inference results from ResNet ONNX models. The image of the cat used in the example is reported as a pool table when using ResNet v1 or v2. This was initially reported in issue #284 Has this been resolved or is there a workaround?

I'm running AWS DLAMI v24.1 on p3.2xlarge.

MSS v 1.05

(mxnet_p36) [ec2-user@ip-172-31-47-248 ~]$ pip show mxnet-model-server
Name: mxnet-model-server
Version: 1.0.5
Summary: Model Server for Apache MXNet is a tool for serving neural net models for inference
Home-page: https://github.com/awslabs/mxnet-model-server
Author: MXNet SDK team
Author-email: noreply@amazon.com
License: Apache License Version 2.0
Location: /home/ec2-user/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages
Requires: model-archiver, Pillow, future, psutil

I've followed the directions in the model zoo and ResNet50 v1 and v2 both show terrible predictions. I'm guessing the input image isn't being fed properly to the models.

Command to start the model server

mxnet-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar resnet50-v1=https://s3.amazonaws.com/model-server/model_archive_1.0/onnx-resnet50v1.mar

Inference requests:

$ curl -X POST http://127.0.0.1:8080/predictions/squeezenet -T kitten.jpg
[
  {
    "probability": 0.8582226634025574,
    "class": "n02124075 Egyptian cat"
  },
  {
    "probability": 0.09160050004720688,
    "class": "n02123045 tabby, tabby cat"
  },
  {
    "probability": 0.037487514317035675,
    "class": "n02123159 tiger cat"
  },
  {
    "probability": 0.0061649843119084835,
    "class": "n02128385 leopard, Panthera pardus"
  },
  {
    "probability": 0.003171598305925727,
    "class": "n02127052 lynx, catamount"
  }
$ curl -X POST http://127.0.0.1:8080/predictions/resnet50-v1 -T kitten.jpg
[
  {
    "probability": 177.36087036132812,
    "class": "n03982430 pool table, billiard table, snooker table"
  },
  {
    "probability": 174.36253356933594,
    "class": "n03942813 ping-pong ball"
  },
  {
    "probability": 172.44493103027344,
    "class": "n03661043 library"
  },
  {
    "probability": 163.64395141601562,
    "class": "n02788148 bannister, banister, balustrade, balusters, handrail"
  },
  {
    "probability": 159.49754333496094,
    "class": "n03065424 coil, spiral, volute, whorl, helix"
  }
vdantu commented 5 years ago

@tahouse : The result is indeed very funny. I think we might need to look into the resnet-50 model. To unblock yourself, I would recommend taking any resnet-50 model (from MXNet or PyTroch) and using the model-archiver tool to create your own model file to be loaded onto MMS. In the mean time, we can take a look into the issues with the model.

tahouse commented 5 years ago

Thanks for the response. I've tried using the model archiver with a Gluon Vision pretrained model. I followed the tutorial for SqueezeNet -- replacing with an exported ResNet50 model. The Model Archiver never completes and the output .mar file grows until I kill the process (got up to 30 GB before I killed the process). Maybe the handler needs modification?

vdantu commented 5 years ago

@tahouse : Could you share the reproducible steps? Since its a pretrained model, please provide the exact steps of how you prepared the model artifacts and how you ran model-archiver tool.