EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
4.42k stars 308 forks source link

metal phi3 --dtype bf16 "Function 'cast_f32_bf16' does not exist #761

Open jk2K opened 2 months ago

jk2K commented 2 months ago

Describe the bug

cargo run  --features metal --package mistralrs-server --bin mistralrs-server -- --token-source cache -i plain -m microsoft/Phi-3.5-mini-instruct -a phi3 --dtype bf16

error message

.4800033569336, 64.51000213623047, 64.52999877929688, 64.83999633789063], scaling_type: Su }), max_position_embeddings: 131072, use_flash_attn: false, sliding_window: Some(262144), original_max_position_embeddings: 4096, quantization_config: None }
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:13<00:00, 13.48it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 67/67 [00:07<00:00, 9.84it/s]
Error: Metal error Error while loading function: "Function 'cast_f32_bf16' does not exist"

Latest commit or version

5fcc9d6f8c0159feb3a237d07e8b3eb191dc6474

jk2K commented 2 months ago

related to https://github.com/huggingface/candle/issues/2163

kinchahoy commented 1 week ago

Hey folks - is there a solution for this? Does it mean I can't really use mistral.rs on a mac for Llama 3.2 vision?

EricLBuehler commented 1 week ago

@kinchahoy are you having this issue? I cannot reproduce it on my Mac - everything works.

kinchahoy commented 1 week ago

Hey Eric - thanks for taking a look. I get:

when I run /examples/python/llama_vision.py with the following changes:

MODEL_ID = "EricB/Llama-3.2-11B-Vision-Instruct-UQFF"

and

which=Which.VisionPlain(
    model_id=MODEL_ID,
    arch=VisionArchitecture.VLlama,
    from_uqff="llama3.2-vision-instruct-q4k.uqff"
),

❯ python llama_vision_v2.py 2024-10-28T19:33:51.245838Z INFO mistralrs_core::pipeline::vision: Loadingtokenizer.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF 2024-10-28T19:33:51.246011Z INFO mistralrs_core::pipeline::vision: Loadingconfig.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF 2024-10-28T19:33:51.543602Z INFO mistralrs_core::pipeline::paths: Found model weight filenames ["residual.safetensors"] 2024-10-28T19:33:51.684169Z INFO mistralrs_core::pipeline::vision: Loadinggeneration_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF 2024-10-28T19:33:51.800356Z INFO mistralrs_core::pipeline::vision: Loadingpreprocessor_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF 2024-10-28T19:33:51.912937Z INFO mistralrs_core::pipeline::vision: Loadingtokenizer_config.jsonatEricB/Llama-3.2-11B-Vision-Instruct-UQFF 2024-10-28T19:35:54.736120Z INFO mistralrs_core::pipeline::vision: Loading modelEricB/Llama-3.2-11B-Vision-Instruct-UQFFon metal[4294968663]. 2024-10-28T19:35:54.736198Z INFO mistralrs_core::pipeline::vision: Model config: MLlamaConfig { vision_config: MLlamaVisionConfig { hidden_size: 1280, hidden_act: Gelu, num_hidden_layers: 32, num_global_layers: 8, num_attention_heads: 16, num_channels: 3, intermediate_size: 5120, vision_output_dim: 7680, image_size: 560, patch_size: 14, norm_eps: 1e-5, max_num_tiles: 4, intermediate_layers_indices: [3, 7, 15, 23, 30], supported_aspect_ratios: [(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (3, 1), (4, 1)] }, text_config: MLlamaTextConfig { rope_scaling: Some(MLlamaRopeScaling { rope_type: Llama3, factor: Some(8.0), original_max_position_embeddings: 8192, attention_factor: None, beta_fast: None, beta_slow: None, short_factor: None, long_factor: None, low_freq_factor: Some(1.0), high_freq_factor: Some(4.0) }), vocab_size: 128256, hidden_size: 4096, hidden_act: Silu, num_hidden_layers: 40, num_attention_heads: 32, num_key_value_heads: 8, intermediate_size: 14336, rope_theta: 500000.0, rms_norm_eps: 1e-5, max_position_embeddings: 131072, tie_word_embeddings: false, cross_attention_layers: [3, 8, 13, 18, 23, 28, 33, 38], use_flash_attn: false, quantization_config: None } } 2024-10-28T19:35:54.745491Z INFO mistralrs_core::utils::normal: DType selected is F16. Traceback (most recent call last): File "/Users/raistlin/mistral.rs/examples/python/llama_vision_v2.py", line 7, in <module> runner = Runner( ^^^^^^^ ValueError: Metal error Error while loading function: "Function 'cast_bf16_f16' does not exist"

EricLBuehler commented 1 week ago

@kinchahoy could you please let me know what your hardware (chip, memory, etc) is?

kinchahoy commented 1 week ago

OS: macOS Sequoia 15.1 arm64 Host: MacBook Air (M2, 2022) Kernel: Darwin 24.1.0 Display (Color LCD): 3420x2224 @ 60 Hz (as 1710x1112) in 14" [Built-in] CPU: Apple M2 (8) @ 3.50 GHz GPU: Apple M2 (10) @ 1.40 GHz [Integrated] Memory: 9.60 GiB / 16.00 GiB (60%) Swap: Disabled Disk (/): 255.10 GiB / 926.35 GiB (28%) - apfs [Read-only]

Thanks again for taking a look at this Eric!