Open Paramstr opened 3 weeks ago
Did you find the solution? I find this exact words on Keras, but this class does not have the from_pretrained method
The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!
You have to import it like this:
from transformers import AutoModelForCausalLM, AutoTokenizer, MistralForCausalLM
But you will quickly encounter even more issues after that (at least I do).
I'm still struggling too. Also looks like there's a problem with kvCache.
@junpeiz
The sample util code is having some final touches done, and will be released once completed. Thank you for your patience!
If I'm not mistaken the code you use (demo_utils
) is not related to hugging face transformers models.
I'm trying to convert using LLMs on huggingface transformers as others above this comment, but from scratch without being too dependent on you example code.
I'm encountering issues when reading slices from the key value cache tensors.
(see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L788)
cache_position = cache_kwargs.get("cache_position")
k_out = self.key_cache[layer_idx]
v_out = self.value_cache[layer_idx]
k_out[:, :, cache_position] = key_states
v_out[:, :, cache_position] = value_states
The assignment is causing issues in
@register_torch_op
def index_put(context, node):
as the cache_position tensor containing a whole slice typically range(0, n) where n < context_size and index_put
does not seem to support proper slices (except if it's the full slice).
It crashes with failing to concatenate the non scalar cache_position into a index tensor consisting of one scalar per dim.
add_op(context, node)
File "coremltools/converters/mil/frontend/torch/ops.py", line 3989, in **index_put**
begin = mb.concat(values=begin, axis=0)
File "coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
return cls._add_op(op_cls_to_add, **kwargs)
File "coremltools/converters/mil/mil/builder.py", line 202, in _add_op
new_op.type_value_inference()
File "coremltools/converters/mil/mil/operation.py", line 257, in type_value_inference
output_types = self.type_inference()
File "coremltools/converters/mil/mil/ops/defs/iOS15/tensor_operation.py", line 1011, in type_inference
raise ValueError(msg.format(v.name, v.rank, rank))
ValueError: Input squeeze_0 has rank 1 != other inputs rank 0
Here is also a screenshot from the debugger to see the tensor dimensions.
is0
and is20
should be the same inferred shapes in the end btw.
Questions: Could you give some pointers on how to model the attention cache in a way that it is compatible with CoreML? How did you conceptually implement cache retrieval and update in your example code?
❓Question: Am trying to replicate the WWDC24 showcase: Bring your machine learning and AI models to Apple silicon
https://developer.apple.com/videos/play/wwdc2024/10159/
In the walkthrough they show a code example in which they convert a mixtral model. When I try to replicate the same code I get this error.
NameError: name 'MistralCausalLM' is not defined
My script