apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.39k stars 633 forks source link

TF Decoder model converted to CoreML model starts returning Nan MLMultiArray during inference from certain time step. #2072

Open seungjun-green opened 10 months ago

seungjun-green commented 10 months ago

I'm having a same problem. I created a simple Transformer Decoder in TensorFlow, and it works well. But If I convert it to a CoreML model, from some point it start do just outputs MLMultiArray filled with NaN values. And Strange thing is that if I reinitialize the model during at every time step, CoreML never returns NaN array during inference.

for i in 0..<30 {
    decoder = try! iOS_Deocder(configuration: config) // <- Like this!
    let ddd = decoder.prediction(input_1: image_feature, input_2: tokens!).Identity

    // some additional codes

}

To address this issue, I tried converting the TF mode to CoreML model with compute_precision=coremltools. precision.Float32 and compute_precision=coremltools. precision.Float16 and also tried setting let config = MLModelConfiguration() config.computeUnits = .cpuOnly but none of them didn't work.

But strange thing is that the way I define model in TF slightly improved it.

The final output layer in TF looked like this:

final_output = self.final_layer(seq_layer_output)
final_output = final_output + custom_bias

but removing the last line like this: final_output = self.final_layer(seq_layer_output)

Improved the CoreML model in following way: Previously CoreML model started to generating NaN array from third time step of inference, this made CoreML to start generating NaN array from 5th or 6th. Plus also removing all for loops for decoder layers also improved it.

My guess is that during some inferencing step some inner state of the CoreML is being stored, and that's affecting inferencing at next time step?

Spend 4 days into it, but can't figure it out. Can anyone help me with this issue?

TobyRoseman commented 10 months ago

Can you share simple standalone code to reproduce the issue?

This sounds like it's an issue with the Core ML Framework, not the coremltools python package. If it's a problem with the Core ML Framework, you should submit the bug (with code to reproduce) using the Feedback Assistant.