apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.2k stars 1.14k forks source link

Mac hardware dependent Failed assertion in object detection model running in MacOS #146

Closed borderlineinteractive closed 6 years ago

borderlineinteractive commented 6 years ago

Hi,

I am getting the following error when I run an object detection model based on the cats and dogs example in a command-line MacOS project in Xcode:

validateComputeFunctionArguments:852: failed assertion Compute Function(TARR_elementwise_mul_f16_pack4): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:<null>) bound at index 2 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_b [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) (
    float
).
(lldb) 

What's especially unusual is that this error only appears on my iMac (Retina 5K, 27-inch, 2017) and not on my mid 2010 Macbook Pro. The code actually works perfectly on the mid 2010 Macbook Pro as expected. I also noticed that the following message always appears when I run a turicreate model on the 2017 iMac and not on the mid 2010 Macbook Pro:

2017-12-29 19:22:36.740697+0100 CmdSand[29721:7166209] VPA info: plugin is INTEL.
VPA info: plugin is INTEL, AVD_id = 1080020, AVD_api.Create:0x125636cfa
2017-12-29 19:22:36.747578+0100 CmdSand[29721:7166209] AVD info: codecHALEnableHEVCDecoder = 1

Even on the 2017 iMac, everything works fine afterwards with an image classifier model, but with the object detection model I get the error message above. Unfortunately, I cannot find any useful documentation about the MTLPixelFormat formats to troubleshot. It seems to me, that the HEVC capability of the new iMac is the cause of this problem. Is there any way to disable this within Xcode as a workaround?

I am using the following code that was mostly derived from the cats and dogs example:

let semaphore = DispatchSemaphore(value: 2)

func output_handler2(request: VNRequest, error: Error?) {
    let results = request.results as! [VNCoreMLFeatureValueObservation]

    let coordinates = results[0].featureValue.multiArrayValue!
    let confidence = results[1].featureValue.multiArrayValue!

    let confidenceThreshold = 0.25
    var unorderedPredictions = [Prediction]()
    let numBoundingBoxes = confidence.shape[0].intValue
    let numClasses = confidence.shape[1].intValue
    let confidencePointer = UnsafeMutablePointer<Double>(OpaquePointer(confidence.dataPointer))
    let coordinatesPointer = UnsafeMutablePointer<Double>(OpaquePointer(coordinates.dataPointer))
    for b in 0..<numBoundingBoxes {
        var maxConfidence = 0.0
        var maxIndex = 0
        for c in 0..<numClasses {
            let conf = confidencePointer[b * numClasses + c]
            if conf > maxConfidence {
                maxConfidence = conf
                maxIndex = c
            }
        }
        if maxConfidence > confidenceThreshold {
            let x = coordinatesPointer[b * 4]
            let y = coordinatesPointer[b * 4 + 1]
            let w = coordinatesPointer[b * 4 + 2]
            let h = coordinatesPointer[b * 4 + 3]

            let rect = CGRect(x: CGFloat(x - w/2), y: CGFloat(y - h/2),
                              width: CGFloat(w), height: CGFloat(h))

            let prediction = Prediction(labelIndex: maxIndex,
                                        confidence: Float(maxConfidence),
                                        boundingBox: rect)
            unorderedPredictions.append(prediction)
        }
    }
    semaphore.signal()
}

let model_features = try VNCoreMLModel(for: detect_features().model)

let request_features = VNCoreMLRequest(model: model_features, completionHandler: output_handler2)

let link_str = "https://media-cdn.tripadvisor.com/media/photo-s/01/62/9d/5b/addo-national-park.jpg"
var imageURL = URL(string: link_str)
let inputImage = CIImage(contentsOf: imageURL!)
if(inputImage != nil)
{
    let object_image = inputImage!
    let handler_features = VNImageRequestHandler(ciImage: inputImage!)
    try handler_features.perform([request_features])
    semaphore.wait()
}

Best wishes,

Leif

gustavla commented 6 years ago

Thanks so much for reporting this in detail, Leif (@borderlineinteractive)! I'm really sorry about this issue. I am actively looking into this and will keep you posted.

borderlineinteractive commented 6 years ago

Thanks for looking into this. FYI: On both machines, the version of Xcode is 9.2 and of MacOS is 10.13.2

Best wishes,

Leif

zacharyblank commented 6 years ago

Similar error message for me. I am using an ARSceneView to capture the video frames on an iPhone X.

validateComputeFunctionArguments:852: failed assertion `Compute Function(TARR_elementwise_mul_f16_pack4): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:<null>) bound at index 2 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_b [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) (
    float
).'
zacharyblank commented 6 years ago

@gustavla I did a bit of looking into this and it seems that the error is a bit misleading. I believe that one of the destination textures in the model that is created is null for some operation. I don't know where to begin to fix this but I hope that helps.

Thanks very much!

smahurkar commented 6 years ago

Facing the same issue.

validateComputeFunctionArguments:852: failed assertion `Compute Function(TARR_elementwise_mul_f16_pack4): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:) bound at index 2 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_b [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) ( float ).' (lldb)

zacharyblank commented 6 years ago

@smahurkar how are you capturing the video (or image) frames? When I use an AVCaptureSession it works just fine. I only get the above error when I am using an ARSCNView session.

smahurkar commented 6 years ago

I am using ARSCNView as well.

` let pixbuff : CVPixelBuffer? = (sceneView.session.currentFrame?.capturedImage) if pixbuff == nil { return }

    print("pixel buffer is of type")
    print(CVPixelBufferGetPixelFormatType(pixbuff!))

    let ciImage = CIImage(cvPixelBuffer: pixbuff!)

    let imageRequestHandler = VNImageRequestHandler(ciImage: ciImage, options: [:])

    do {
        try imageRequestHandler.perform(self.visionRequests)
    } catch {
        print(error)
    }`

I have also tried sceneview.snapshot(), and then converted UIImage to CIImage.

dmcgloin commented 6 years ago

Same issue. I'm using ARSCNView session, specifically: sceneView.session.currentFrame?.capturedImage

I can provide reproducible sample app if needed.

gustavla commented 6 years ago

Thanks everyone, this is really useful information! We are looking into the issue.

smahurkar commented 6 years ago

@gustavla Thank you for your help. I have a deadline to meet for the end of next week to integrate object detection with ARKit. I am using turicreate for this purpose. Do you think this issue will be resolved by next week?

gchiste commented 6 years ago

The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:) bound at index 2 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_b [[texture(0)]]).

Exactly the same issue as dmcgloin, using sceneView.session.currentFrame?.capturedImage.

creative-intersection commented 6 years ago

The same error also occurs when calling the model directly via the predictionFromImage method. Like reported above works from a AVCaptureOutput when converting from the CMSampleBuffer but not from images captured from ARSession didUpdateFrame. Have even tried converting the ARFrame.capturedImage to a CMSampleBuffer then back again to a CVPixelBufferRef but results in the error.

smahurkar commented 6 years ago

@gustavla here is a sample project that reproduces the issue of object detection model with scenekit. https://github.com/smahurkar/ARObjectDetectionError

gustavla commented 6 years ago

@smahurkar Thanks!

I just wanted everyone to know that this problem remains high priority for us. It is a problem that likely needs to be addressed in macOS, so the time frame for rolling out a fix is unfortunately slower than if the problem had been inside Turi Create. I'll post an update as I know more.

srikris commented 6 years ago

@smahurkar Can you run sw_vers on your Mac and let us know what version of MacOS you are running?

We want to confirm that we can reproduce the same on your exact same machine and version that our fix fixes it on a newer version of Mac.

smahurkar commented 6 years ago

Here are the results of sw_vers:

ProductName: Mac OS X ProductVersion: 10.12.6 BuildVersion: 16G1212

srikris commented 6 years ago

@smahurkar CoreML doesn't exist on that version so I'm not sure that your issue is the same as the one in this thread.

smahurkar commented 6 years ago

@srikris my issue, in particular, was with using ARKit Sceneview with the object detection model. The object detection model trained from Turi worked with AVFoundation. I have another mac:

ProductName: Mac OS X ProductVersion: 10.13.2 BuildVersion: 17C205

If you like, I can try training the model on the other mac and get back to you with results.

srikris commented 6 years ago

@smahurkar Great. This second Mac seems to be consistent with the bug. We'll get back to you soon with an update.

BTW, were you able to get your deployed model in the App Store?

smahurkar commented 6 years ago

@srikris No, I have not attempted deploying the model.

suyashgupta25 commented 6 years ago

I am facing the same issue. Is there any update regarding it?

LennartOlsen commented 6 years ago

I am using the example supplied from https://github.com/apple/turicreate/blob/master/userguide/activity_classifier/export_coreml.md my only difference is that i supply the data from an external sensor not from the internal CMAccelerometerData

Running the application on my MacBook pro 14,1 running macOS 10.13.3 it works fine (as a native app), though running the exact same framework on my iPhone SE, iOS 11.2.5 gives

validateComputeFunctionArguments:852: failed assertion `Compute Function(cnnConvArray_8xIn_8xOut_1_1): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:<null>) bound at index 0 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (srcArray [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) (
    float
).' 

Ive tried tracing the error but I cant seem to figure out where it goes wrong. My predictor is https://github.com/LennartOlsen/SensorTagRaider/blob/bd6fbff9d582095c10c8dfa07b6c16d6b509b84d/SensorTagRaider/Controllers/Predictor/Predictor.swift#L59

srikris commented 6 years ago

@LennartOlsen Your issue is not the same as the others. I've filed a new issue #307 to track that.

srikris commented 6 years ago

@suyashgupta25 @smahurkar @borderlineinteractive Can try upgrading to the latest 10.13.3 beta https://beta.apple.com/sp/betaprogram/ That should fix your issue.

srikris commented 6 years ago

@zacharyblank @dmcgloin It looks like your issue is not the same as the one tracked in this issue. Can you share a sample project. I can make another issue.

smahurkar commented 6 years ago

@srikris I am unable to find a macOS public beta on the app store.

smahurkar commented 6 years ago

@srikris Here are some system details: ProductName: Mac OS X ProductVersion: 10.13.3 BuildVersion: 17D102

The app doesn't work with this build. Could you provide more details on how to upgrade to the version to Beta OS required?

I have followed the steps here: https://beta.apple.com/sp/betaprogram/redemption#macos and I don't see the option to download beta in App Store Updates.

smahurkar commented 6 years ago

@srikris I tried the latest version of the public Beta 10.13.4 Beta (17E160g) and the issue still exists. Do you need me to try a version from the developer Beta?

suyashgupta25 commented 6 years ago

In my case it's arising when I am running app in debug mode from xcode. The operation on which it crashes is the perform method being called using array of requests as a parameter.

srikris commented 6 years ago

@smahurkar Let me get back to you shortly on this.

gustavla commented 6 years ago

I was just made aware of a potential work-around:

In Xcode: Product -> Scheme -> Edit Scheme... -> Options -> Metal API Validation -> Set to Disabled

Please let me know if you have success with this. I'm told this gets around @zacharyblank's issue in particular. I would like to hear from other too though (ping @smahurkar @borderlineinteractive @LennartOlsen).

LennartOlsen commented 6 years ago

@gustavla That "solved" the problem for me, it runs nicely now, I cant say that its an acceptable solution as it seems a bit off.

Kudos anyhow!

borderlineinteractive commented 6 years ago

@gustavla

This works for me as well. If I understand this setting correctly, disabling metal API validation should not reduce CoreML performance, but might even increase performance. Thanks for reporting this very useful workaround.

gustavla commented 6 years ago

@LennartOlsen @borderlineinteractive Great! I'm glad to hear it worked. Of course, it does not make a proper solution any less of a priority for us.

smahurkar commented 6 years ago

@gustavla Thank you for the workaround. It worked. However, if I send continuous frames (without delay) for processing to the model I get the error:

[Technique] World tracking performance is being affected by resource constraints [1]

The application is then terminated due to memory issue.

LennartOlsen commented 6 years ago

@gustavla , as also posted in #307 i found that my issue only occurs when forcing my application to use the Metal framework, with a Scene View or any other Metal enabled rendering method.

Dont know if this applies to anyone else.

dmcgloin commented 6 years ago

@srikris I provided an example app to @gustavla

srikris commented 6 years ago

@smahurkar Can you tell me you hardware version? Is it a MacBook Pro? If so, which version

smahurkar commented 6 years ago

@srikris I tested using two MacBook pros. The app was tested on iPhone 7.

MacBook Pro (Retina, 13-inch, Early 2015) MacBook Pro (Retina, 13-inch, Mid 2014)

zacharyblank commented 6 years ago

@gustavla that works for me! Thank you!!

jfgirard commented 6 years ago

@gustavla I'm having the same error running Object Detection on iPhoneSE/6S with iOS 11.3.1 on a model generated by Turi create (4.3.2) on a Macbook pro 2012 (10.13.4). I can either use the workaround (Disable Metal Validation) or set request.usesCPUOnly = true (VNCoreMLRequest) to make it work.
validateComputeFunctionArguments:852: failed assertion `Compute Function(TARR_elementwise_mul_f16_pack4): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:<null>) bound at index 2 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_b [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) ( float ).

gustavla commented 6 years ago

@jfgirard Thanks for reporting this! I think you are the first to report this issue on iOS, where we have never seen this issue until now. I'll check around if anyone else has seen this.

minhohihi commented 6 years ago

To @gustavla

I'm facing same issue at iOS11.3 on iPhoneX. I split one huge mlmodel into two mlmodel and connect each mlmodel sequentially (input image -> mlmodel1 -> mlmodel2 -> output). Output of mlmodel1 is type of MLMultiArray and put into mlmodel2. Then I got a message below. Your suggested workaround works fine, but too slow.

validateComputeFunctionArguments:852: failed assertion `Compute Function(TARR_elementwise_add_k_f16_pack4): The pixel format (MTLPixelFormatRGBA32Float) of the texture (name:) bound at index 1 is incompatible with the data type (MTLDataTypeHalf) of the texture parameter (src_a [[texture(0)]]). MTLPixelFormatRGBA32Float is compatible with the data type(s) ( float ).'

srikris commented 6 years ago

@minhohihi This seems like a different issue from the title. Can you make a new GitHub issue for that and we can reply to that.

minhohihi commented 6 years ago

@srikris I wrote a comment because gchiste reported same problem (https://github.com/apple/turicreate/issues/146#issuecomment-357542708). I'll make new issue~ :)

srikris commented 6 years ago

Fixed in MacOS 10.14