apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.35k stars 628 forks source link

Sound Analysis throwing DSPGraph::Exception #704

Closed osianSmith closed 2 years ago

osianSmith commented 4 years ago

❓When using SoundAnalysis on a model that takes in audio samples and returns MultiArray (Float32 12288) I received the error DSPGraph:: Exception.

I am using a different model that takes in an audioSample (MultiArray (Float32 15600)) and returns the vggishFeature (MultiArray (Float32 12288)). You can download the model here We want it as a feature extractor as we want a different inference model. When using SoundAnalysis I receive this stack:

0          0x1ba2b5464  <redacted> + 220
1          0x1ba22f050  <redacted> + 492
2          0x1ba20ae30  <redacted> + 120
3          0x1ba212774  <redacted> + 84
4          0x1ba21c40c  <redacted> + 216
5          0x1ba21c8b0  <redacted> + 268
6          0x10278b730  _dispatch_client_callout + 16
7          0x10279a488  _dispatch_lane_barrier_sync_invoke_and_complete + 124
8          0x1ba21c76c  <redacted> + 112
9          0x1024346c4  $s8SRTester18homeViewControllerC16startAudioEngine33_CDAAA73F093090436FCAC2E152DEFC64LLyyFySo16AVAudioPCMBufferC_So0M4TimeCtcfU_yycfU_ + 388
10         0x102434714  $sIeg_IeyB_TR + 56
11         0x10278a338  _dispatch_call_block_and_release + 24
12         0x10278b730  _dispatch_client_callout + 16
13         0x102792740  _dispatch_lane_serial_drain + 744
14         0x1027932e0  _dispatch_lane_invoke + 444
15         0x10279e6c4  _dispatch_workloop_worker_thread + 1304
[truncated?]
libc++abi.dylib: terminating with uncaught exception of type DSPGraph::Exception
(lldb)

And Recieve the error on this line:

    private func startAudioEngine() {
        self.isLisitingForInferance = true

        //requests to use the engine
        do {
            let request = try SNClassifySoundRequest(mlModel: soundClassifier.model)
            try analyzer.add(request, withObserver: resultsObserver) // sets the results observator
        } catch {
            print("Unable to prepare request: \(error.localizedDescription)")
            return
        }
        //starts a async task for the analyser
        audioEngine.inputNode.installTap(onBus: 0, bufferSize: 16000, format: inputFormat) { buffer, time in
            self.analysisQueue.async {
                self.analyzer.analyze(buffer, atAudioFramePosition: time.sampleTime) //this line recives a SIGABRT
            }
        }

        do{
            try audioEngine.start()
        }catch( _){
            print("error in starting the Audio Engine")
        }
    }

Steps to debug:

System Information

anilkatti commented 4 years ago

We are looking into this. In the meanwhile, could you please file a bug report at bugreport.apple.com?

osianSmith commented 4 years ago

Sure! I have submitted it to bugreport.apple.com with reference code FB7702029.

osianSmith commented 4 years ago

Hey, I've had this back from bugreport.apple.com:

Engineering has provided the following information regarding this issue:

There are three issues going on here:

1. This is not a classifier model, it's a feature extractor model. I don’t expect it to work with SNClassifySoundRequest since it doesn’t output a classification dictionary like the SoundAnalysis documentation states for SNClassifySoundRequest: “The provided model must accept audio data as input, and output a classification dictionary containing the probability of each category.” 

If the intention is to use SoundAnalysis to perform feature extraction instead of classification, then that sounds like a feature request for a new type of API.
If the intention is to use SoundAnalysis to perform classification, then the model provided should output a classification dictionary.

2. When printing the model description...

po model.model.modelDescription

inputs: (
    "audioSamples : MultiArray (Float32, 15600)"
)
outputs: (
    "vggishFeature : MultiArray (Float32, 12288)"
)
predictedFeatureName: classLabel
predictedProbabilitiesName: classLabelProbs 

...the model claims to have predictedFeatureName "classLabel" and predictedProbabilitiesName "classLabelProbs". However, the model does not actually provide output features with these names.

3. SoundAnalysis should return an error in SNClassifySoundRequest initWithMLModel when the user's model is malformed. This would prevent the exception thrown later by the framework at inference time. This third issue is tracked internally. 

From my understanding that SoundAnalysis needs its class label changed but SoundAnalysis as that is what is throwing the error? But based on "The provided model must accept audio data as input, and output a classification dictionary containing the probability of each category" we cannot use SoundAnalysis as a feature Extractor right?

Thanks

Osian

anilkatti commented 4 years ago

Hi @osianSmith

From my understanding that SoundAnalysis needs its class label changed but SoundAnalysis as that is what is throwing the error?

Correct. The error is because the API you are using (SNClassifySoundRequest) requires a classifier model (that outputs class and classProbabilities). That said, it should not have crashed. We are using the bug report you filed to address that.

But based on "The provided model must accept audio data as input, and output a classification dictionary containing the probability of each category" we cannot use SoundAnalysis as a feature Extractor right?

That is correct as well. I'd encourage you to file a feature request if you'd like to use sound analysis framework for feature extraction.

osianSmith commented 4 years ago

Hi @anilkatti

Thank you! I have submitted this as a feature request to the feedback assistant.

anilkatti commented 4 years ago

@osianSmith

We have a workaround for you. Things will need to do if we wants to implement your own front-end:

  1. Ensure audio is at 16kHz sampling rate, 1 channel. You may be able to configure his recording API to do this, or use AVAudioConverter
  2. Chunk up audio into 15600 sample chunks
  3. Pack 15600 float samples into MLMultiArray
  4. predictionFromFeatures on your MLModel directly instead of using SNClassifySoundRequest.

You will need to modify the model to expose the feature extractor output as the pipeline model's output. Here's an example of dealing with pipeline models. https://github.com/apple/coremltools/blob/master/examples/updatable_models/updatable_tiny_drawing_classifier.ipynb

Also, you can use Netron to visualize the model and understand sub-model's output. Let me know if you run into any issues.

anilkatti commented 4 years ago

@osianSmith I would still keep the feature request open since what I provided is a workaround. We could use your feature request to track a more natural solution to your problem.

osianSmith commented 4 years ago

Sure @anilkatti! I haven't had the chance to test it yet but will keep you updated as soon as I get around to! Hoping to be able to try it by the end of next week!

osianSmith commented 4 years ago

Hi @anilkatti That documentation is no longer available, do you know where it has moved to?

Thank you

Osian

anilkatti commented 4 years ago

Could you try this? https://coremltools.readme.io/docs/updatable-tiny-drawing-classifier-pipeline-model

TobyRoseman commented 2 years ago

Since we have not heard back here, I am going to close this issue.

In the future, please report issues with the Core ML Framework to bugreport.apple.com.