Closed uogbuji closed 2 months ago
OK well I got it past the system role error and now I'm into something deeper.
File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/mlx_lm/models/gemma2.py", line 159, in __call__
h = layer(h, mask, c)
^^^^^^^^^^^^^^^^^
File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/mlx_lm/models/gemma2.py", line 122, in __call__
r = self.self_attn(self.input_layernorm(x), mask, cache)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/uche/.local/venv/temp/lib/python3.11/site-packages/mlx_lm/models/gemma2.py", line 80, in __call__
output = mx.fast.scaled_dot_product_attention(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Shapes (1,8,2,32,36) and (32,32) cannot be broadcast.
On the latest relevant upstream releases:
pip show mlx mlx_lm
Name: mlx
Version: 0.15.2
Summary: A framework for machine learning on Apple silicon.
Home-page: https://github.com/ml-explore/mlx
Author: MLX Contributors
Author-email: mlx@group.apple.com
License:
Location: /Users/uche/.local/venv/temp/lib/python3.11/site-packages
Requires:
Required-by: mlx-lm, Toolio
---
Name: mlx-lm
Version: 0.15.0
Summary: LLMs on Apple silicon with MLX and the Hugging Face Hub
Home-page: https://github.com/ml-explore/mlx-examples
Author: MLX Contributors
Author-email: mlx@group.apple.com
License: MIT
Location: /Users/uche/.local/venv/temp/lib/python3.11/site-packages
Requires: jinja2, mlx, numpy, protobuf, pyyaml, transformers
Required-by: Toolio
Upstream issue: https://github.com/ml-explore/mlx-examples/issues/868
Pushed the system role fix, but will keep this open while upstream fix or workaround is in progress.
Forgot to tag that last commit; 078d815
Looks like it wasn't an upstream bug, but rather a mix-up by me. I think Gemma should be GTG.
Originally reported by Mark Lord.
Repro steps:
Try a request such as:
Resulting exception is mangled & useless; final stanza:
Looks like llm-structured-output interpolates a system prompt, and Gemma refuses this. Indeed, from
$HOME/.local/share/models/mlx/Gemma-2-9B-It-SPPO-Iter3-8bit/tokenizer_config.json
:Luckily this looks like one we can patch locally in serer.py, rather than needing it upstream in llm-structured-output.