Different behavior between Model and Core ML model

cianiandreadev commented 6 years ago

I successfully trained a Object Detection model (default TC YOLO) with TC b3 and exported in CoreML format. My model has 8000 iterations and 0.8 final loss.

I then validate it with some images using TC and bounding box drawing util and it recognizes them better that I expected!

I then downloaded the sample project for recognizing objects in live capture presented by @znation during the WWDC and replaced the model in the project with my new model.

What it's weird is that the object are no longer recognized. Is NOT a problem of VNDetectedObjectObservation because they are correctly returned, but it seems that the classing and the bounding box does not represent the detected object correctly (different class and wrong bounding box). I used iOS 12 beta 9 and Xcode 10 beta 6 as developing environment, with an iPad Pro 2017 (or 2016, I don't remember).

From my first test seems this could be a rotation issue, but I don't know if that is the real issue and eventually how to fix it.

Does anybody faced a similar issue or can eventually help me with that?

cianiandreadev commented 6 years ago

I currently found the problem: I think that my model is orientation dependent, while the model for the breakfast-recognition is orientation-independent (in terms of objects). My model is trained to recognize objects only in one orientation (and doesn't even make sense for me to train for other orientations because I know that the orientation will always be the same).

Now, this is a pure iOS/Vision problem and not related to TC, but if anyone knows how to rotate the CMSampleBuffer before sanding to the Vision framework would be really appreciated a hint.

TobyRoseman commented 6 years ago

@cianiandreadev - glad you figured the issue out. Thanks for letting us know about the fix.

nickjong commented 6 years ago

I'm not an expert, but in the sample code there's a spot in VisionObjectRecognitionViewController's captureOutput method where a CVPixelBuffer/CVImageBuffer is retrieved from the CMSampleBuffer. Probably the image can be rotated there.

But the very next line, creating the VNImageRequestHandler, takes an orientation argument...

cianiandreadev commented 6 years ago

Thanks @nickjong for you reply. Yes, the orientation is indeed passed. I also tried to force the video connection to be in portrait mode using videoDataOutput.connection(with: AVMediaType.video)?.videoOrientation = .portrait but in the end I still don't have the correct result. I can will post you the screenshots, maybe they can help:

In the photo itself you can see the object recognized perfectly with the TC model, on yellow there is instead the CoreML+Vision recognition.

The project I'm using is the breakfast example presented by @znation with the portrait edit (without it the model wasn't able to recognize the object at all). I also slightly modified the rectangle to just show the border instead of filling it.

Any idea why so much difference between CoreML and TC?

znation commented 6 years ago

Reopening - this difference is not expected. Let's make sure it can be explained by the rotation/orientation (and see if we can fix it that way) before concluding that this isn't a bug.

cianiandreadev commented 6 years ago

Thanks a lot @znation for your answer. I double-checked the orientation and I don't think is related to that now.

I just submit a new screenshot with multiple animals if help. The animals are perfectly recognized with very good confidence, then I would say that the model works well.

The red overlay is the detectionOverlay, that I put visible to check if it is right (and it seems to be since the camera is on 640x480 mode).

The code I used is 99% based on the one presented in the WWDC (I referred it with a link in the first post of this issue). I can submit any snippet of it if helps.

img_4d64257a1abd-1

nickjong commented 6 years ago

From these screenshots, it looks like you're comparing the results across two scenarios:

running in Turi Create on a Mac, taking a photo as input
running in CoreML (+ Vision) on an iPhone, taking an image from the camera pointed at a screen showing the photo What happens if, on the phone, you save off the image you're evaluating, load it in Turi Create on your Mac, and then run inference from there?

@gustavla Do we expect our data augmentation to make the model robust to pictures of a screen showing data from the original training distribution? Did you try this use case in your work?

cianiandreadev commented 6 years ago

@nickjong I already tried your approach indeed and didn't work anyway. Here an example of a picture taken from the iPhone and passed back to the TC model evaluation:

unknown

If I scan them with CoreML: Same issue.

cianiandreadev commented 6 years ago

Any news on this issue? Some other ideas/ approaches on how I can try to fix it? I really tried everything I could and still didn't find any solutions 🙁

sejersbol commented 6 years ago

I have see some of the same issues in my initial tests. I was wondering if it happens because the bounding boxes are drawn after the camera has moved? I.e. in your example https://user-images.githubusercontent.com/1215511/44509464-0dc1e300-a6b2-11e8-8dfe-28cbc530ebe9.jpeg the camera has been moved down a little before the bounding boxes are drawn? I don't have any good examples to show right now...

cianiandreadev commented 6 years ago

Nope. I try to keep it as steady as possible to avoid this kind of artifacts. I also tried to "manually" center the bb to the animals (by moving the phone) and in the next frame they are moved again down by the evaluation. @sejersbol Do you have a working project? Maybe some snippet I can have a look?

sejersbol commented 6 years ago

@cianiandreadev I used the breakfast sample just like you - only changed the mlmodel file. My guess is that the problem must reside with the model itself.

cianiandreadev commented 6 years ago

@sejersbol I have the same feeling. But since the TuriCreate version works fluently It must be the conversion between TC and CoreML. I will try to convert again today.

sejersbol commented 6 years ago

@cianiandreadev Using the sample breakfast project, and I just changed the mlmodel file I get the below screenshots from my initial test model (all about insects ;-)).

It looks like the same problem you have, don't you agree?

cianiandreadev commented 6 years ago

Absolutely. I think be both have the same issue. @znation , what is your opinion?

znation commented 6 years ago

@cianiandreadev I agree, it seems like the problem is in the conversion to .mlmodel for these models. Based on what we are seeing, I suspect there is a bug in the conversion process. Would you be willing to share your Turi Create saved model (comes out as a directory; typically we name them .model or .tcmodel), and your .mlmodel? We can try to debug why the predictions are coming out differently between the two.

cianiandreadev commented 6 years ago

@znation Actually I should ask to my company but I think I can easy persuade the peoples-that-counts to share it privately with you. Is that ok? Can you tell me eventually an e-mail I can send the files to? Thanks for your support.

znation commented 6 years ago

@cianiandreadev That would be great, thanks! My email address is in my profile.

cianiandreadev commented 6 years ago

@znation , as requested on Thursday 08-30 I sent you the e-mail with subject "Andrea Ciani - Obj detection issue 1016". Can you kindly tell me if you received it correctly? Hope it helps. 🙂 Thanks in advance.

srikris commented 6 years ago

We are investigating. Thanks!

swupnil commented 6 years ago

I would like to chime in here. I have a turicreate model in production (via conversion to CoreML) on the App Store, and it works fine on iOS 11 / watchOS 4 and older devices.

However, the same model does not work on iOS 12 / watchOS 5 devices. My model is an LSTM that classifies user activities based on accelerometer in the Apple Watch, and when I run the app in the latest Xcode on a watchOS 5 device, the output is constantly bogus and makes no sense (my hand is still, yet it predicts I am doing an activity that requires a lot of motion).

The same exact model with the same exact build of the app works fine on my watchOS 4 device. I am certain this is a model conversion issue.

(I have also tried this with quantized version of the model that I create in High Sierra, and that model has the same issues.)

I am very concerned about this as my app's CoreML feature will basically stop working when watchOS 5 comes out and all of my users upgrade :(

srikris commented 6 years ago

@swupnil I'm opening a separate issue for this. Can you share your model there?

swupnil commented 6 years ago

@srikris I would prefer to discuss this and share models over email. Can you please email me at swupnil@swing.tennis?

I am very concerned that our user base will not be able to use the CoreML features of our app once watchOS 5 is publicly released.

gustavla commented 6 years ago

One thing that comes to mind for me is that the aspect ratio is close to 2:1 on the phone, which means when we resize it down to a square to pass it through the network, it has undergone quite a lot of deformation. We have data augmentation to create robustness for this, but perhaps not that extreme.

For debugging purposes, let's compare Turi Create and Core ML results side by side. You can make a Core ML prediction through Python using coremltools (and then run the same image through Turi Create):

from PIL import Image
import coremltools

mlmodel = coremltools.models.MLModel('detector.mlmodel')
pil_img = Image.open('image.png')
mlmodel.predict({'image': pil_img})  # adjust 'image' depending on the MLModel's input name

@sejersbol @cianiandreadev You can both test this as a first step to make sure the conversion was successful.

cianiandreadev commented 6 years ago

@gustavla thanks for the input. I didn't know about this feature. My question is: Since the input of the model is in both case (iOS or TuriCreate) always the same, why we don't have such problem on the TuriCreate? Is there a better resize on the TuriCreate side? I will try your input and see what happen. I'll update you asap.

baihualinxin commented 6 years ago

Using Vision algorithm frame is not accurate，

The problem of training, or the problem of conversion?

I can provide mlmodel https://github.com/apple/turicreate/issues/1079 @znation @cianiandreadev

cianiandreadev commented 5 years ago

No news about that? 🙁 Can we help in some ways?

shantanuchhabra commented 5 years ago

Hey @cianiandreadev , really sorry for the late follow up. We're looking into this and will get back to you with an update as soon as we can!

cianiandreadev commented 5 years ago

@shantanuchhabra Thanks for your update. Really looking forward to see it working :)

ghop02 commented 5 years ago

Hi @shantanuchhabra, apologies if this is the incorrect place to post this, but I think I am seeing a similar bug. I am seeing on most models used on iOS 12 that it is cropping part of the image unnecessarily. In the debugger I captured the input image and output of that image (on our style transfer model).

imagebuffer stylizedbuffer

You can see that image is cut off at the top and bottom and then the output is stretched. I realize this may not be the correct post, but it seems to be a similar problem. We do not see this issue when directly feeding the image into the Core ML model (not using the vision framework).

I also filed a bug (problem 44237403).

Thanks in advance!

shantanuchhabra commented 5 years ago

@ghop02 Thanks for bringing this to our attention! We're working on your bug report and posted an update on the thread there.

ghop02 commented 5 years ago

Thanks @shantanuchhabra. I don't actually see an update on that thread, but maybe I'm just looking at it incorrectly

kinergy commented 5 years ago

I am having the same problem - bug submitted 44742649 with sample code. iOS 11.4.1 is no longer being signed, and iOS 12.0.1 still doesn't work.

kinergy commented 5 years ago

I have determined that this isn't just a Vision issue. Using our model directly with CoreML produces various results depending on what hardware is used or simulator.

ghop02 commented 5 years ago

Interesting @kinergy. Using the model with CoreML directly produces the expected results.

SpencerKaiser commented 5 years ago

I'm experiencing similar problems and I'm also seeing some weird behavior with confidence. When I get a hit on an object, it always has a confidence of 1.0, which seems a little questionable.

@gustavla I tried the snippet you added above for trying the ML Model results in the following error:

{
    NSLocalizedDescription = "Failed to evaluatue model 0 in pipeline";
    NSUnderlyingError = "Error Domain=com.apple.CoreML Code=0 \"Required input feature not passed to neural network.\" UserInfo={NSLocalizedDescription=Required input feature not passed to neural network.}";
}

Any idea what the issue could be?

andrewgleave commented 5 years ago

This is the output from the same set of 5 images from TC directly, and a via the CoreML model generated by TC.

Classes are: ['hammer', 'mallet', 'saw', 'spanner']

You can see that confidences are not the same between the same images (img1: TC: 53% "mallet", CoreML: 51% "mallet") and entirely different number of detections e.g. img2: TC detected 3 objects ("hammer", "mallet" and "saw"), and CoreML detected 2 objects ("saw" and "mallet").

I have formatted the output as JSON for readability and easy comparison.

Turi Create

[
  [],  // img0
  [
  // img1
    {
      "confidence": 0.5330119460790017,
      "type": "rectangle",
      "coordinates": {
        "y": 349.0502166810225,
        "x": 363.1568311292976,
        "width": 47.589577894944455,
        "height": 104.67505821814905
      },
      "label": "mallet"
    }
  ],
  [
  // img2
    {
      "confidence": 0.7227546304968571,
      "type": "rectangle",
      "coordinates": {
        "y": 311.1036779317327,
        "x": 391.10805598230706,
        "width": 95.9827599158653,
        "height": 35.754324839665344
      },
      "label": "mallet"
    },
    {
      "confidence": 0.6302023798550092,
      "type": "rectangle",
      "coordinates": {
        "y": 353.9289310963444,
        "x": 214.99848938061638,
        "width": 69.83673095703125,
        "height": 140.37098517784705
      },
      "label": "saw"
    },
    {
      "confidence": 0.2668375713857364,
      "type": "rectangle",
      "coordinates": {
        "y": 206.30439410686503,
        "x": 230.99165211068106,
        "width": 46.683355478140044,
        "height": 106.4533204298753
      },
      "label": "hammer"
    }
  ],
  [
  // img3
    {
      "confidence": 0.9285530193537053,
      "type": "rectangle",
      "coordinates": {
        "y": 252.4806663563403,
        "x": 212.0289733341525,
        "width": 51.05623685396637,
        "height": 152.18538137582632
      },
      "label": "saw"
    }
  ],
  [
  // img4
    {
      "confidence": 0.9527265307086529,
      "type": "rectangle",
      "coordinates": {
        "y": 300.42011761439545,
        "x": 270.080183376321,
        "width": 124.64950561523443,
        "height": 112.85323509803186
      },
      "label": "saw"
    }
  ]
]

CoreML


[
  // img0
  {
    "confidence": "array([], shape=(0, 4), dtype=float64)",
    "coordinates": "array([], shape=(0, 4), dtype=float64)"
  },
  // img1
  {
    "confidence": "array([[0.00092554, 0.51025391, 0.00391006, 0.00431824]])",
    "coordinates": "array([[0.56640625, 0.72851562, 0.07305908, 0.21313477]])"
  },
  // img2
  {
    "confidence": "array([[2.56061554e-04, 3.97443771e-04, 7.51464844e-01, 6.42299652e-04], [1.80339813e-03, 7.26562500e-01, 4.86373901e-03, 1.27887726e-03]])",
    "coordinates": "array([[0.33447266, 0.73681641, 0.10754395, 0.29394531], [0.61083984, 0.64697266, 0.14831543, 0.07611084]])"
  },
  // img3
  {
    "confidence": "array([[0.00162029, 0.00136662, 0.92138672, 0.00336266]])",
    "coordinates": "array([[0.32983398, 0.52539062, 0.07958984, 0.31738281]])"
  },
  // img4
  {
    "confidence": "array([[6.56843185e-05, 2.13980675e-05, 9.49218750e-01, 2.38060951e-04]])",
    "coordinates": "array([[0.42285156, 0.625, 0.1953125 , 0.23803711]])"
  }
]

SpencerKaiser commented 5 years ago

@andrewgleave how were you able to use your CoreML model in Python? I tried using @gustavla's suggestions but I got an error. More info in my comment above.

The only thing I'm concerned about is that I built my model with images that have transparency and had to use a workaround provided by @gustavla to get it to work. Now I'm concerned my model is looking for a different field/value, resulting in that crash. This is my first stab at anything ML, so I'm definitely a little out of my comfort zone here!

andrewgleave commented 5 years ago

As above. The code is very similar:

import glob

from PIL import Image
import coremltools

mlmodel = coremltools.models.MLModel('XYZ.mlmodel')

for path in glob.glob('../shots/*.JPG'):
    img = Image.open(path)
    # resize according to network input image shpae
    resized = img.resize((416, 416), Image.LINEAR)
    print(str(mlmodel.predict({'image': resized})) + '\n')

Note: the choice of resampling filter affects predictions. I don't think it's enough to account for what I'm seeing above. Maybe I'm wrong, but it no alternative filter gave me comparable results to TC.

yousifKashef commented 5 years ago

We’re seeing this problem too. Core ML model performance improves when you turn the phone sideways to test object detection model. Any known fixed yet?

johnyquest7 commented 5 years ago

I am experiencing the same problem. Looks like in the iPhone the image is cropped differently. I am using my model for image classification. Using the image picker, it looks like the frame is shifted upwards. Lower part of the image is cropped and there is more of the upper part.

johnyquest7 commented 5 years ago

If I give a perfectly square image it crops the lower bottom!

philimanjaro commented 5 years ago

I have been struggling with this same issue as well when providing my own CoreML model created with TuriCreate 5.1 to the BreakfastFinder. The bounding boxes that are drawn within the app are very offset.

As another user mentioned in this issue, I too am getting ridiculously high confidence scores with the new default Turi CoreML export format -- with all recongized objects coming back at 0.999+ confidence scores, which makes filtering out false positives very difficult. When I export in the older CoreML format and use a different demo app, the confidence scores are at reasonable and expected levels, false positives are easy to filter out, and the bounding boxes work quite a bit better.

If I hold the phone in portrait mode and place the object to be detected at the top of the view, the bounding box is close-ish to the object, but several pixels lower than it should be. The problem gets worse as I move the object lower towards the bottom of the phone. The lower the object is inside the camera's view, the worse the bounding box offset placement is. Holding the phone in landscape mode doesn't really seem to make much of a difference and the offset still varies depending on where the object is detected on screen.

I can't figure out why this offset is so severe or why the confidence scores, even for false positives on just about any 'detected' object is so high when using the default CoreML export options in Turi 5.0+

(see attached GIF)

screenrecording_101820181316

sumac13 commented 5 years ago

I am seeing similar behaviour as @philimanjaro, the bounding boxes always appear to be lower than they should be and this issue exacerbates as the object moves lower in the camera frame. I am also seeing this behaviour running the template breakfast app with the default model and my trained model on a device running 12.0.1

gustavla commented 5 years ago

Really sorry it has taken so long to track this issue down. This problem can be resolved by adding

objectRecognition.imageCropAndScaleOption = .scaleFill

on your VNCoreMLRequest object. I think different crop-and-scale options should still work reasonably (even though we recommend .scaleFill, since that is how the model is trained), so we are definitely following up on that as well.

philimanjaro commented 5 years ago

Really sorry it has taken so long to tack this issue down. This problem can be resolved by adding
objectRecognition.imageCropAndScaleOption = .scaleFill
on your VNCoreMLRequest object. I think different crop-and-scale options should still work reasonably (even though we recommend .scaleFill, since that is how the model is trained), so we are definitely following up on that as well.

@gustavla, that resolved the bounding box issue completely on my end! Thank you for posting that solution.

Like other users, I am also still experiencing the very high confidence scores using the model -- many times clocking in at 0.999+. Is that issue being tracked somewhere else separately? This current issue seems to be tracking the bounding box issue (now resolved!) as well as the confidence score issue when using the model in the iOS app.

Thanks again.

andrewgleave commented 5 years ago

Unexpectedly high confidences are being tracking by: #1314

fengyiqicoder commented 5 years ago

Nice work

fengyiqicoder commented 5 years ago

🎉🎉🎉

brandonmaul commented 5 years ago

Awesome! thank you for the fix!

apple / turicreate

Different behavior between Model and Core ML model #1016

Turi Create

CoreML