Converted PyTorch model is significantly larger and slower than the original

iansampson commented 2 years ago

I successfully converted a pre-trained PyTorch model to CoreML. The converted model produces the expected output, but the .mlpackage file is more than 45 times larger than the original PyTorch .pth.tar file, and prediction takes more than 7 times longer.

Here’s my fork of the model repo. My conversion script is trace.py, which also includes one composite operator (for atan2). The model code is found in aia_inter_new.py and aia_trans.py. The pre-trained model that I use for conversion is BEST_MODEL/aia_merge_dns300_conti.pth.tar. If it’s useful, I can also provide the converted .mlpackage and the traced .pt file.

I’ll be grateful for your help :). This seems to be a bug, and I’m not sure how to go about resolving it on my own.

To Reproduce

Clone the repo and check out the coreml branch:

git clone https://github.com/iansampson/DBT-Net.git
cd DBT-Net
git checkout coreml

Assuming PyTorch and coremltools are already installed, install these additional dependencies:

pip install ptflops
pip install librosa
pip install pesq

Trace and convert the model:

python trace.py

The script traces the model with JIT (using a random input of a fixed size), converts the traced model to CoreML, and prints out prediction times for the original model, the traced model, and the CoreML model. Both the JIT trace and the CoreML model are saved in the coreml directory.

On my machine (a 2018 13" MacBook Pro), trace.py prints out the following values:

Prediction time for original model:  15.275246999999979
Prediction time for traced model:  15.296491000000003
Prediction time for CoreML model:  108.631306

And the file sizes of the respective models are:

aia_merge_dns300_conti.pth.tar: 11.8 MB
dbt-net_aia_merge_dns300.pt: 12.2 MB
dbt-net_aia_merge_dns300.mlpackage: 538.8 MB

System environment:

coremltools version: 5.2.0
PyTorch version: 1.10.1
macOS version: Monterey

TobyRoseman commented 2 years ago

Is this still an issue with our latest beta release?

To install our latest beta, run: pip install -U --pre coremltools

iansampson commented 2 years ago

Thanks for the quick response! Just tested with the latest release (6.0b1) and yes, the results are pretty much the same (prediction time: 106.645615 s; mlpackage size: 538.8 MB).

iansampson commented 2 years ago

I think I’ve traced the problem to the GRU layer used in TransformerEncoderLayer (aia_inter_new.py, line 41). Here’s a simple script that reproduces the problem:

import time
import numpy as np
import torch
import torch.nn as nn
from torch.nn.modules.rnn import GRU
import coremltools as ct

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.gru = GRU(64, 128, 1, bidirectional=True)

    def forward(self, x: torch.Tensor):
        return self.gru(x)

x = torch.randn(401, 80, 64)
model = Model().eval()

traced_model = torch.jit.trace(model, x)
torch.jit.save(traced_model, "gru_test.pt")

ml_model = ct.convert(traced_model,
                      inputs=[ct.TensorType(name="x", shape=x.shape)],
                      convert_to="mlprogram")
ml_model.save("gru_test.mlpackage")

# Measure prediction time for original model
t0 = time.process_time()
output = model(x)
t1 = time.process_time()
print("Prediction time for original model:", t1 - t0)

# Measure prediction time for traced model
t0 = time.process_time()
output = traced_model(x)
t1 = time.process_time()
print("Prediction time for traced model:", t1 - t0)

# Measure prediction time for CoreML model
t0 = time.process_time()
ml_output = ml_model.predict({"x": x.numpy()})
t1 = time.process_time()
print("Prediction time for CoreML model:", t1 - t0)

On my MacBook, this prints:

Prediction time for original model: 0.4766010000000005
Prediction time for traced model: 0.5226019999999991
Prediction time for CoreML model: 7.105665

And the file size of the saved models is:

gru_test.pt: 601 KB
gru_test.mlpackage: 33.3 MB

apple / coremltools

Converted PyTorch model is significantly larger and slower than the original #1551

To Reproduce

System environment: