apple / coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
https://coremltools.readme.io
BSD 3-Clause "New" or "Revised" License
4.25k stars 613 forks source link

NameError: name 'MistralCausalLM' is not defined #2256

Open Paramstr opened 3 weeks ago

Paramstr commented 3 weeks ago

❓Question: Am trying to replicate the WWDC24 showcase: Bring your machine learning and AI models to Apple silicon

https://developer.apple.com/videos/play/wwdc2024/10159/

In the walkthrough they show a code example in which they convert a mixtral model. When I try to replicate the same code I get this error.

NameError: name 'MistralCausalLM' is not defined

My script

import torch
from torch import nn
import numpy as np
import coremltools as ct
from transformers import AutoModelForCausalLM, AutoTokenizer

# Define the class for the Stateful Mistral model
class StatefulMistral(torch.nn.Module):
    def __init__(self, modelPath, batchSize=1, contextSize=2048):
        super().__init__()
        self.model = MistralCausalLM.from_pretrained(modelPath)

        self.register_buffer("keyCache", torch.zeros(self.model.kvCacheShape))
        self.register_buffer("valueCache", torch.zeros(self.model.kvCacheShape))

    def forward(self, inputIds, causalMask):
        return self.model(inputIds, causalMask, self.keyCache, self.valueCache).logits

torch_model = StatefulMistral("mistralai/Mistral-7B-Instruct-v0.2").eval()
PabloButron commented 3 weeks ago

Did you find the solution? I find this exact words on Keras, but this class does not have the from_pretrained method

junpeiz commented 2 weeks ago

The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!

alexeichhorn commented 1 week ago

You have to import it like this:

from transformers import AutoModelForCausalLM, AutoTokenizer, MistralForCausalLM

But you will quickly encounter even more issues after that (at least I do).

daltheman commented 1 week ago

I'm still struggling too. Also looks like there's a problem with kvCache.

RobertBiehl commented 1 week ago

@junpeiz

The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!

If I'm not mistaken the code you use (demo_utils) is not related to hugging face transformers models. I'm trying to convert using LLMs on huggingface transformers as others above this comment, but from scratch without being too dependent on you example code. I'm encountering issues when reading slices from the key value cache tensors.

(see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L788)

cache_position = cache_kwargs.get("cache_position")
k_out = self.key_cache[layer_idx]
v_out = self.value_cache[layer_idx]

k_out[:, :, cache_position] = key_states
v_out[:, :, cache_position] = value_states

The assignment is causing issues in

@register_torch_op
def index_put(context, node):

as the cache_position tensor containing a whole slice typically range(0, n) where n < context_size and index_put does not seem to support proper slices (except if it's the full slice). It crashes with failing to concatenate the non scalar cache_position into a index tensor consisting of one scalar per dim.

    add_op(context, node)
  File "coremltools/converters/mil/frontend/torch/ops.py", line 3989, in **index_put**
    begin = mb.concat(values=begin, axis=0)
  File "coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "coremltools/converters/mil/mil/builder.py", line 202, in _add_op
    new_op.type_value_inference()
  File "coremltools/converters/mil/mil/operation.py", line 257, in type_value_inference
    output_types = self.type_inference()
  File "coremltools/converters/mil/mil/ops/defs/iOS15/tensor_operation.py", line 1011, in type_inference
    raise ValueError(msg.format(v.name, v.rank, rank))
ValueError: Input squeeze_0 has rank 1 != other inputs rank 0

Here is also a screenshot from the debugger to see the tensor dimensions.

Screenshot 2024-07-15 at 15 49 58

is0 and is20 should be the same inferred shapes in the end btw.

Questions: Could you give some pointers on how to model the attention cache in a way that it is compatible with CoreML? How did you conceptually implement cache retrieval and update in your example code?