LukeLIN-web commented 1 month ago

🐞Describing the bug

Stack Trace

root@lambda-server:~/share/Wav2Lip# python trans.py 
Load checkpoint from: ./checkpoints/wav2lip.pth
XGBoost version 1.7.5 has not been tested with coremltools. You may run into unexpected errors. XGBoost 1.4.2 is the most recent version that has been tested.
Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Traceback (most recent call last):
  File "/root/share/Wav2Lip/trans.py", line 31, in <module>
    import coremltools as ct
  File "/usr/local/lib/python3.10/dist-packages/coremltools/__init__.py", line 121, in <module>
    from . import converters, models, optimize, proto
  File "/usr/local/lib/python3.10/dist-packages/coremltools/converters/__init__.py", line 7, in <module>
    from . import libsvm, sklearn, xgboost
  File "/usr/local/lib/python3.10/dist-packages/coremltools/converters/libsvm/__init__.py", line 8, in <module>
    from . import _libsvm_converter, _libsvm_util
  File "/usr/local/lib/python3.10/dist-packages/coremltools/converters/libsvm/_libsvm_converter.py", line 7, in <module>
    from coremltools.models import _METADATA_SOURCE, _METADATA_VERSION
  File "/usr/local/lib/python3.10/dist-packages/coremltools/models/__init__.py", line 39, in <module>
    from . import ml_program
  File "/usr/local/lib/python3.10/dist-packages/coremltools/models/ml_program/__init__.py", line 6, in <module>
    from . import compression_utils
  File "/usr/local/lib/python3.10/dist-packages/coremltools/models/ml_program/compression_utils.py", line 10, in <module>
    from coremltools.optimize.coreml import (
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/__init__.py", line 11, in <module>
    from . import torch
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/__init__.py", line 6, in <module>
    from coremltools.optimize.torch import (
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/palettization/__init__.py", line 59, in <module>
    from .fake_palettize import FakePalettize
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/palettization/fake_palettize.py", line 27, in <module>
    from .palettization_config import DEFAULT_PALETTIZATION_ADVANCED_OPTIONS
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/palettization/palettization_config.py", line 396, in <module>
    {
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/palettization/palettization_config.py", line 397, in <dictcomp>
    key: ModuleDKMPalettizerConfig.from_dict(val)
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/_utils/python_utils.py", line 87, in from_dict
    return converter.structure_attrs_fromdict(data_dict, cls)
  File "/usr/local/lib/python3.10/dist-packages/cattrs/converters.py", line 756, in structure_attrs_fromdict
    return cl(**conv_obj)
  File "<attrs generated init coremltools.optimize.torch.palettization.palettization_config.ModuleDKMPalettizerConfig>", line 13, in __init__
    _setattr('dtype', __attr_converter_dtype(dtype))
  File "/usr/local/lib/python3.10/dist-packages/coremltools/optimize/torch/_utils/torch_utils.py", line 101, in maybe_convert_str_to_dtype
    "fp8_e4m3": _torch.float8_e4m3fn,
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'

To Reproduce

Please add a minimal code example that can reproduce the error when running it.
```
import torch
import torchvision
from models import Wav2Lip
```

def load_model(path): model = Wav2Lip() print("Load checkpoint from: {}".format(path)) checkpoint = torch.load(path) s = checkpoint["state_dict"] new_s = {} for k, v in s.items(): new_s[k.replace('module.', '')] = v model.load_state_dict(new_s)

return model.eval()

checkpoint_path = './checkpoints/wav2lip.pth' torch_model = load_model(checkpoint_path)

Set the model in evaluation mode.

torch_model.eval()

Trace the model with random data.

example_input = torch.rand(1, 3, 224, 224)

img_batch = torch.randn(1, 6, 5, 96, 96) mel_batch = torch.randn(1, 5, 1, 80, 16) traced_model = torch.jit.trace(torch_model, (mel_batch, img_batch)) out = traced_model(mel_batch, img_batch)

example_input = (mel_batch, img_batch)

import coremltools as ct model = ct.convert( traced_model, convert_to="mlprogram", inputs=[ct.TensorType(shape=mel_batch.shape),ct.TensorType(shape=img_batch.shape) ] ) model.save("newmodel.mlmodel")


- If the model conversion succeeds, but there is a numerical mismatch in predictions, please include the code used for comparisons.

## System environment (please complete the following information):
 - coremltools version:  8.0 
 - OS (e.g. MacOS version or Linux type):  Ubuntu
 - Any other relevant version information (e.g. PyTorch or TensorFlow version):

## Additional context
- Add anything else about the problem here that you want to share.

LukeLIN-web commented 1 month ago

But we still can convert model

import torch
import torchvision
from models import Wav2Lip

def load_model(path):
    model = Wav2Lip()
    print("Load checkpoint from: {}".format(path))
    checkpoint =  torch.load(path)
    s = checkpoint["state_dict"]
    new_s = {}
    for k, v in s.items():
        new_s[k.replace('module.', '')] = v
    model.load_state_dict(new_s)

    return model.eval()

checkpoint_path = './checkpoints/wav2lip.pth'
torch_model = load_model(checkpoint_path)
# Set the model in evaluation mode.
torch_model.eval()

# Trace the model with random data.
# example_input = torch.rand(1, 3, 224, 224) 
img_batch = torch.randn(1, 6, 5, 96, 96)
mel_batch = torch.randn(1, 5, 1, 80, 16)
traced_model = torch.jit.trace(torch_model, (mel_batch, img_batch))
out = traced_model(mel_batch, img_batch)

# example_input = (mel_batch, img_batch)

import coremltools as ct
model = ct.convert(
    traced_model,
    convert_to="mlprogram",
    inputs=[ct.TensorType(shape=mel_batch.shape),ct.TensorType(shape=img_batch.shape) ]
 )
model.save("newmodel.mlpackage")

It can works in ubuntu even Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython' still appears

glenn-jocher commented 1 month ago

Same problem.

NaturalStupidlty commented 1 month ago

I have encountered the same problem trying to convert the DeepLabV3 to CoreML and in my case it does not work. There is also an error with the conversion of the profiler so I am unsure if the coremltools.libcoremlpython is the actual crash reason.

Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython'
Converting PyTorch Frontend ==> MIL Ops:   0%|          | 0/473 [00:00<?, ? ops/s]

ERROR - converting 'profiler::_record_function_enter_new' op (located at: '5'):

Ubuntu 22.04, torch 2.4.0

import torch
import network

import coremltools as ct

num_classes = 21
model_name = 'deeplabv3_resnet50'
weights = 'weights/best_deeplabv3_resnet50_voc_os16.pth'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_map = {
    'deeplabv3_resnet50': network.deeplabv3_resnet50,
    'deeplabv3plus_resnet50': network.deeplabv3plus_resnet50,
    'deeplabv3_resnet101': network.deeplabv3_resnet101,
    'deeplabv3plus_resnet101': network.deeplabv3plus_resnet101,
    'deeplabv3_mobilenet': network.deeplabv3_mobilenet,
    'deeplabv3plus_mobilenet': network.deeplabv3plus_mobilenet
}
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = model_map[model_name](num_classes=num_classes, output_stride=16)
checkpoint = torch.load(weights, map_location=device, weights_only=False)
model.load_state_dict(checkpoint["model_state"])
model = torch.nn.DataParallel(model)
model.to(device)
model.eval()

# Trace the model with random data.
example_input = torch.rand(1, 3, 1282, 1026).to(device)
traced_model = torch.jit.trace(model, example_input)
with torch.no_grad():
    outputs = traced_model(example_input)
    #preds = outputs.detach().max(dim=1)[1].cpu().numpy()

# Convert to Core ML neural network using the Unified Conversion API.
model = ct.convert(
    traced_model,
    convert_to="neuralnetwork",
    inputs=[ct.TensorType(shape=example_input.shape)]
)

model.save(f"{model_name}.mlmodel")

TobyRoseman commented 1 month ago

What OS and version of Python are you using?

NaturalStupidlty commented 1 month ago

I have encountered the same problem trying to convert the DeepLabV3 to CoreML and in my case it does not work. There is also an error with the conversion of the profiler so I am unsure if the coremltools.libcoremlpython is the actual crash reason.

The coremltools.libcoremlpython was not the crash reason in my case. I have successfully runner conversion of DeepLabV3 to CoreML even with this error.

Btw, the error was in using torch.nn.DataParallel, you need to convert the model module only:

import torch
import network

import coremltools as ct

num_classes = 21
model_name = 'deeplabv3_resnet50'
weights = 'weights/best_deeplabv3_resnet50_voc_os16.pth'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model_map = {
    'deeplabv3_resnet50': network.deeplabv3_resnet50,
    'deeplabv3plus_resnet50': network.deeplabv3plus_resnet50,
    'deeplabv3_resnet101': network.deeplabv3_resnet101,
    'deeplabv3plus_resnet101': network.deeplabv3plus_resnet101,
    'deeplabv3_mobilenet': network.deeplabv3_mobilenet,
    'deeplabv3plus_mobilenet': network.deeplabv3plus_mobilenet
}

model = model_map[model_name](num_classes=num_classes, output_stride=16)
checkpoint = torch.load(weights, map_location=device, weights_only=False)
model.load_state_dict(checkpoint["model_state"])
model = torch.nn.DataParallel(model)
# Either use model.module to access the model or remove the DataParallel wrapper.
model = model.module
model.to(device)
model.eval()

# Trace the model with random data.
example_input = torch.rand(1, 3, 1282, 1026).to(device)
traced_model = torch.jit.trace(model, example_input)
with torch.no_grad():
    # torch.Size([1, 21, 1282, 1026])
    outputs = traced_model(example_input)
    #preds = outputs.detach().max(dim=1)[1].cpu().numpy()

# Convert to Core ML neural network using the Unified Conversion API.
model = ct.convert(
    traced_model,
    convert_to="neuralnetwork",
    inputs=[ct.TensorType(shape=example_input.shape)]
)

model.save(f"{model_name}.mlmodel")

NaturalStupidlty commented 1 month ago

What OS and version of Python are you using?

btw, Ubuntu 22.04 and Python 3.10

TobyRoseman commented 1 month ago

I just downloaded the manylinux1 Python 3.10 wheel for our 8.0 release. It contains libcoremlpython.so.

I think you must have somehow installed using an egg, rather than a wheel, i.e. you're doing a source install rather than a binary install. You could uninstalling then running pip install --prefer-binary coremltools==8.0.

NaturalStupidlty commented 1 month ago

I just downloaded the manylinux1 Python 3.10 wheel for our 8.0 release. It contains libcoremlpython.so.

I think you must have somehow installed using an egg, rather than a wheel, i.e. you're doing a source install rather than a binary install. You could uninstalling then running pip install --prefer-binary coremltools==8.0.

In my case it did not help, the issue persists. I have tried using a fresh python==3.10 venv environment with torch==2.4.0 (latest tested) and cuda==12.2 in two different installations of Ubuntu 22.04. Same error.

TobyRoseman commented 1 month ago

Can you share logs for installing coremltools in a fresh environment?

NaturalStupidlty commented 1 month ago

Can you share logs for installing coremltools in a fresh environment?

Yes, sure!

ihor@naturalstupidity:~/projects/RAI$ python3 -m venv venv
ihor@naturalstupidity:~/projects/RAI$ source venv/bin/activate
(venv) ihor@naturalstupidity:~/projects/RAI$ pip install --prefer-binary coremltools==8.0
Collecting coremltools==8.0
  Downloading coremltools-8.0-cp310-none-manylinux1_x86_64.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 7.8 MB/s eta 0:00:00
Collecting numpy>=1.14.5
  Downloading numpy-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.3/16.3 MB 11.1 MB/s eta 0:00:00
Collecting tqdm
  Downloading tqdm-4.66.5-py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.4/78.4 KB 10.3 MB/s eta 0:00:00
Collecting attrs>=21.3.0
  Downloading attrs-24.2.0-py3-none-any.whl (63 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 63.0/63.0 KB 10.2 MB/s eta 0:00:00
Collecting sympy
  Downloading sympy-1.13.3-py3-none-any.whl (6.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 11.5 MB/s eta 0:00:00
Collecting cattrs
  Downloading cattrs-24.1.2-py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.4/66.4 KB 10.3 MB/s eta 0:00:00
Collecting protobuf>=3.1.0
  Downloading protobuf-5.28.2-cp38-abi3-manylinux2014_x86_64.whl (316 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 316.6/316.6 KB 11.0 MB/s eta 0:00:00
Collecting pyaml
  Downloading pyaml-24.9.0-py3-none-any.whl (24 kB)
Collecting packaging
  Downloading packaging-24.1-py3-none-any.whl (53 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.0/54.0 KB 8.6 MB/s eta 0:00:00
Collecting exceptiongroup>=1.1.1
  Downloading exceptiongroup-1.2.2-py3-none-any.whl (16 kB)
Collecting typing-extensions!=4.6.3,>=4.1.0
  Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Collecting PyYAML
  Downloading PyYAML-6.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (751 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 751.2/751.2 KB 11.2 MB/s eta 0:00:00
Collecting mpmath<1.4,>=1.1.0
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 KB 11.1 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, tqdm, sympy, PyYAML, protobuf, packaging, numpy, exceptiongroup, attrs, pyaml, cattrs, coremltools
Successfully installed PyYAML-6.0.2 attrs-24.2.0 cattrs-24.1.2 coremltools-8.0 exceptiongroup-1.2.2 mpmath-1.3.0 numpy-2.1.2 packaging-24.1 protobuf-5.28.2 pyaml-24.9.0 sympy-1.13.3 tqdm-4.66.5 typing-extensions-4.12.2

coremltools_venv.log

NaturalStupidlty commented 1 month ago

Interestingly, the error does NOT occur in my macOS sequoia 15.0.1 Conda env, Python 3.11.5, coremltools=8.0 I guess it has something to do with Linux binaries @TobyRoseman

TobyRoseman commented 1 month ago

@NaturalStupidlty - I thought you were probably having coremltools installed via an egg rather than a wheel. That would explain this result. However that doesn't seem to be happening. Based on the logs you shared, a wheel is getting used.

@LukeLIN-web - are you also using Python 3.10?

apple / coremltools

Failed to load _MLModelProxy: No module named 'coremltools.libcoremlpython' #2350

🐞Describing the bug

Stack Trace

To Reproduce

Set the model in evaluation mode.

Trace the model with random data.

example_input = torch.rand(1, 3, 224, 224)

example_input = (mel_batch, img_batch)