Segmentation Fault when calling model.predict

ben-xD commented 3 years ago

Hello fellow developers 👋

🐞Describe the bug

The python application crashes (SIGSEGV/ segmentation faults) when model.predict is called on a model which has TensorType (not ImageType) as input.. Model conversion happens fine. The reason I want to use TensorType is that I want to pass a tensor in (which has negative values, etc) and not an image. The model actually works with an ImageType, but I have found that the numbers are very inaccurate when using CoreML compared to Python TensorFlow or TensorFlow Lite (on Python or Android). I can report this inaccuracy in another github issue in the future, but for this issue, I am trying to get TensorType to work.
Is this a converter issue? Unsure, it happens when I call model.predict.

Trace

No trace, just [1] 80595 segmentation fault python3 python_file_name.py

To Reproduce

First install dependencies: pip install tensorflow numpy keras-vggface coremltools keras_applications

import coremltools as ct
from keras_vggface import VGGFace
import numpy as np
from tensorflow.keras.preprocessing import image
from keras_vggface import utils

def create_core_ml_model():
    input = ct.TensorType(shape=(1, 224, 224, 3))
    keras_model = VGGFace(model="senet50", pooling="avg", include_top=False, input_shape=(224, 224, 3))
    coreml_model = ct.convert(keras_model, inputs=[input])
    coreml_model.save("model.mlmodel")

create_core_ml_model()

# Download a random image
image_path = "https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Ftinyjpg.com%2Fimages%2Fsocial%2Fwebsite.jpg&f=1&nofb=1"

import urllib.request

r = urllib.request.urlopen(image_path)
with open("image.jpg", "wb") as f:
    f.write(r.read())
img = image.load_img('image.jpg', target_size=(224, 224))

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = utils.preprocess_input(x, version=2)

coreml_model = ct.models.MLModel("model.mlmodel")
output_dictionary = coreml_model.predict({"input_1": x}) <---- THIS IS WHERE IT SIGSEGV's without any other warnings.
output = output_dictionary["Identity"][0]
print("output: ", output)

If applicable, please attach the source model
- The model is freely available on the internet, and is automatically downloaded by the script I wrote above (It is done by the keras-vggface dependency you installed). It is 100MB, so I would prefer not to upload it as I have ADSL internet...😅

System environment (please complete the following information):

coremltools version (e.g., 3.0b5): 4.0
OS (e.g., MacOS, Linux): MacOS
macOS version (if applicable): macOS Big Sur 11.2 Beta (20D5029f), x86_64 (latest update)
XCode version (if applicable): Version 12.3 (12C33)
How you install python (anaconda, virtualenv, system): Python3's venv
python version (e.g. 3.7): Python 3.7.9
any other relevant information:
- Tensorflow 2.3.1 & Tensorflow 2.4 (tried both separately)

ben-xD commented 3 years ago

Here is the converted model's "layer distribution from XCode" too, maybe using TensorType inputs along with some of these layers cause an issue? Screenshot 2021-01-14 at 22 16 09

Screenshot 2021-01-14 at 22 16 13

TobyRoseman commented 3 years ago

I can reproduce this issue.

I suspect this is an overflow issue. The Neural Network is quite long (320 layers). x also contains values as large as 163.

The following code works fine.

for _ in range(100):
    z = np.random.rand(1, 224, 224, 3)
    output_dictionary = coreml_model.predict({"input_1": z})

np.random.rand produces values between 0 and 1.

ben-xD commented 3 years ago

Interesting. I have to admit the output of the model doesn't appear to look well formed. Some values can be 1000+, and some 0.001. Let me know if there's anything I can help with if you wanted to do more work on this.

piraka9011 commented 1 year ago

Using the latest main @ 3569369, I also get this if my model seems to have too many dynamic shapes. Tested on Python 3.8 and 3.9, Mac OS 12.3.1 (MBP M1 Pro), Xcode 13.4.1 (Build version 13F100)

For me specifically, I have an acoustic model with the following input shapes:

audio_signal: Batch x NumFeatures x SequenceLength
length: Batch

Where Batch and SequenceLength are dynamic (modeled using ct.RangeDim()). If Batch is of size 1 however, I can run predict with no SIGSEGV

Example

import coremltools as ct
from nemo.collections.asr.models import EncDecCTCModelBPE
import torchaudio
import torch

pre_trained_model_name = "stt_en_citrinet_256"
model = EncDecCTCModelBPE.from_pretrained(pre_trained_model_name, map_location='cpu')
model.eval()
input_example = model.encoder.input_example()
example_input = input_example[0]
example_input_len = input_example[1]

# Does not segfault if shape=(1, example_input.shape[1], ct.RangeDim())
audio_signal_shape = ct.Shape(shape=(ct.RangeDim(), example_input.shape[1], ct.RangeDim()))
# Does not segfault if shape=(1,)
length_shape = ct.Shape(shape=(ct.RangeDim(),))

# NeMo Export
export_output_path = f"/tmp/{pre_trained_model_name}.ts"
model.export(
    export_output_path,
    check_trace=True,
    input_example=(example_input, example_input_len)
)

# CoreML Convert
scripted_model = torch.jit.load(export_output_path)
ct_model = ct.convert(
    scripted_model,
    convert_to="mlprogram",
    inputs=[
        ct.TensorType(name="audio_signal", shape=audio_signal_shape),
        ct.TensorType(name="length", shape=length_shape)
    ],
    outputs=[ct.TensorType(name="log_probs")],
    compute_units=ct.ComputeUnit.ALL,
)
ct_model_output_path = f"/tmp/{pre_trained_model_name}.mlpackage"
ct_model.save(ct_model_output_path)

# Testing
example_wav_file = "/path/to/audio.wav"
input_signal, sr = torchaudio.load(example_wav_file)
input_signal_shape = torch.tensor([input_signal.shape[1]])
processed_signal, processed_signal_length = model.preprocessor(
    input_signal=input_signal, length=input_signal_shape
)
# Or just use `example_input` and `example_input_len` instead of the audio file.
coreml_inputs = {
    "audio_signal": processed_signal.to(torch.int32).numpy(),
    "length": processed_signal_length.to(torch.int32).numpy(),
}
coreml_outputs = ct_model.predict(coreml_inputs)
log_probs = coreml_outputs['log_probs']

You can try it on any audio file here: https://huggingface.co/datasets/librispeech_asr (example) Might need to convert to wav using ffmpeg -i audio.mp3 -ar 16000 -ac 1 audio.wav. Or just use example_input and example_input_len instead of the audio file.

Is there a way we can debug this to figure out the root cause? @TobyRoseman It seems to be an issue w/ CoreML according to the crash report/stack trace the OS generated available here (also reported to Apple FWIW 🤷 )

apple / coremltools