exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
GNU General Public License v3.0
6.56k stars 342 forks source link

Mali GPU OpenCL does not support bfloat16 #145

Open artistlu opened 1 month ago

artistlu commented 1 month ago

I'm encountering an issue with my Mali GPU. When I try to inference, I get the following error:

{'319cbd94-148d-4767-80af-950aa5c20-11'}}). Next partition: 
Partition(node_id='319cbd94-148d-4767-80af-950aa5c20-23', start=0, end=0.08333)
Sending tensor_or_prompt to 319cbd94-148d-4767-80af-950aa5c20-23: 
<|im_start|>user
What is the meaning of exo?<|im_end|>
<|im_start|>assistant

Broadcasting opaque status: request_id='2a8e8c0d-1226-4412-b6f0-96a54f58d817' 
status='{"type": "node_status", "node_id": 
"319cbd94-148d-4767-80af-950aa5c20-11", "status": "start_process_prompt", 
"base_shard": {"model_id": "/nasroot/models/Meta-Llama-3-8B/", "start_layer": 0,
"end_layer": 0, "n_layers": 32}, "shard": {"model_id": 
"/nasroot/models/Meta-Llama-3-8B/", "start_layer": 29, "end_layer": 31, 
"n_layers": 32}, "prompt": "<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "image_str": null, 
"inference_state": null, "request_id": "2a8e8c0d-1226-4412-b6f0-96a54f58d817"}'
Traceback (most recent call last):
  File "/nasroot/code/exo_0811/exo/api/chatgpt_api.py", line 306, in 
handle_post_chat_completions
    await self.node.process_prompt(shard, prompt, image_str, 
request_id=request_id)
  File "/nasroot/code/exo_0811/exo/orchestration/standard_node.py", line 102, in
process_prompt
    resp = await self._process_prompt(base_shard, prompt, image_str, request_id,
inference_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^
  File "/nasroot/code/exo_0811/exo/orchestration/standard_node.py", line 137, in
_process_prompt
    await self.forward_to_next_shard(shard, prompt, request_id, 
image_str=image_str, inference_state=inference_state)
  File "/nasroot/code/exo_0811/exo/orchestration/standard_node.py", line 280, in
forward_to_next_shard
    await target_peer.send_prompt(next_shard, tensor_or_prompt, 
image_str=image_str, request_id=request_id, inference_state=inference_state)
  File "/nasroot/code/exo_0811/exo/networking/grpc/grpc_peer_handle.py", line 
55, in send_prompt
    response = await self.stub.SendPrompt(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/nasroot/miniconda3/envs/exo/lib/python3.12/site-packages/grpc/aio/_call.py", 
line 318, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Unexpected <class 'tinygrad.device.CompileError'>: OpenCL 
Compile Error

<source>:2:68: error: unknown type name '__bf16'
__kernel void E_131072_32_4n3(__global half* data0, const __global __bf16* 
data1) {
                                                                   ^

<source>:6:3: error: use of undeclared identifier '__bf16'
  __bf16 val0 = data1[alu0+1];
  ^

<source>:7:3: error: use of undeclared identifier '__bf16'
  __bf16 val1 = data1[alu0+2];
  ^

<source>:8:3: error: use of undeclared identifier '__bf16'
  __bf16 val2 = data1[alu0+3];
  ^

<source>:9:3: error: use of undeclared identifier '__bf16'
  __bf16 val3 = data1[alu0];
  ^

<source>:10:53: error: use of undeclared identifier 'val3'
  *((__global half4*)(data0+alu0)) = 
(half4)((half)(val3),(half)(val0),(half)(val1),(half)(val2));
                                                    ^

error: Compiler frontend failed (error code 62)
"
        debug_error_string = "UNKNOWN:Error received from peer  
{grpc_message:"Unexpected <class \'tinygrad.device.CompileError\'>: OpenCL 
Compile Error\n\n<source>:2:68: error: unknown type name \'__bf16\'\n__kernel 
void E_131072_32_4n3(__global half* data0, const __global __bf16* data1) {\n    
^\n\n<source>:6:3: error: use of undeclared identifier \'__bf16\'\n  __bf16 val0
= data1[alu0+1];\n  ^\n\n<source>:7:3: error: use of undeclared identifier 
\'__bf16\'\n  __bf16 val1 = data1[alu0+2];\n  ^\n\n<source>:8:3: error: use of 
undeclared identifier \'__bf16\'\n  __bf16 val2 = data1[alu0+3];\n  
^\n\n<source>:9:3: error: use of undeclared identifier \'__bf16\'\n  __bf16 val3
= data1[alu0];\n  ^\n\n<source>:10:53: error: use of undeclared identifier 
\'val3\'\n  *((__global half4*)(data0+alu0)) = 
(half4)((half)(val3),(half)(val0),(half)(val1),(half)(val2));\n                 
^\n\nerror: Compiler frontend failed (error code 62)\n", grpc_status:2, 
created_time:"2024-08-12T11:39:57.374975367+08:00"}"
>
Preemptively starting download for 
Shard(model_id='/nasroot/models/Meta-Llama-3-8B/', start_layer=29, end_layer=31,
n_layers=32)
Received SendOpaqueStatus request: 
request_id='2a8e8c0d-1226-4412-b6f0-96a54f58d817' status='{"type": 
"node_status", "node_id": "319cbd94-148d-4767-80af-950aa5c20-23", "status": 
"start_process_prompt", "base_shard": {"model_id": 
"/nasroot/models/Meta-Llama-3-8B/", "start_layer": 0, "end_layer": 1, 
"n_layers": 32}, "shard": {"model_id": "/nasroot/models/Meta-Llama-3-8B/", 
"start_layer": 0, "end_layer": 1, "n_layers": 32}, "prompt": 
"<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "image_str": "", "inference_state": 
null, "request_id": "2a8e8c0d-1226-4412-b6f0-96a54f58d817"}'
Preemptively starting download for 
Shard(model_id='/nasroot/models/Meta-Llama-3-8B/', start_layer=29, end_layer=31,
n_layers=32)
Download already in progress for 
Shard(model_id='/nasroot/models/Meta-Llama-3-8B/', start_layer=29, end_layer=31,
n_layers=32). Keeping that one.
AlexCheema commented 1 month ago

Try running with SUPPORT_BF16=0 e.g. SUPPORT_BF16=0 python3 main.py. Can you let me know if that works?

Ideally we detect this automatically.

artistlu commented 1 month ago

Try running with SUPPORT_BF16=0 e.g. SUPPORT_BF16=0 python3 main.py. Can you let me know if that works?

Ideally we detect this automatically.

In order to load a local model, I have made modifications to the following two methods:

image image

Additionally, I have also added the environment variable SUPPORT_BF16=0 when starting exo.

I am encountering the following error:

{'319cbd94-148d-4767-80af-950aa5c20-11'}}). Next partition: 
Partition(node_id='319cbd94-148d-4767-80af-950aa5c20-23', start=0, end=0.07692)
Sending tensor_or_prompt to 319cbd94-148d-4767-80af-950aa5c20-23: 
<|im_start|>user
What is the meaning of exo?<|im_end|>
<|im_start|>assistant

Broadcasting opaque status: request_id='cb27eced-cd1c-48d8-9ef9-0f4e680346fe' 
status='{"type": "node_status", "node_id": 
"319cbd94-148d-4767-80af-950aa5c20-11", "status": "start_process_prompt", 
"base_shard": {"model_id": "/nasroot/models/Meta-Llama-3-8B/", "start_layer": 0,
"end_layer": 0, "n_layers": 32}, "shard": {"model_id": 
"/nasroot/models/Meta-Llama-3-8B/", "start_layer": 29, "end_layer": 31, 
"n_layers": 32}, "prompt": "<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "image_str": null, 
"inference_state": null, "request_id": "cb27eced-cd1c-48d8-9ef9-0f4e680346fe"}'
Traceback (most recent call last):
  File "/nasroot/code/exo_0814/exo/api/chatgpt_api.py", line 306, in 
handle_post_chat_completions
    await self.node.process_prompt(shard, prompt, image_str, 
request_id=request_id)
  File "/nasroot/code/exo_0814/exo/orchestration/standard_node.py", line 102, in
process_prompt
    resp = await self._process_prompt(base_shard, prompt, image_str, request_id,
inference_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^
  File "/nasroot/code/exo_0814/exo/orchestration/standard_node.py", line 137, in
_process_prompt
    await self.forward_to_next_shard(shard, prompt, request_id, 
image_str=image_str, inference_state=inference_state)
  File "/nasroot/code/exo_0814/exo/orchestration/standard_node.py", line 280, in
forward_to_next_shard
    await target_peer.send_prompt(next_shard, tensor_or_prompt, 
image_str=image_str, request_id=request_id, inference_state=inference_state)
  File "/nasroot/code/exo_0814/exo/networking/grpc/grpc_peer_handle.py", line 
55, in send_prompt
    response = await self.stub.SendPrompt(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File 
"/nasroot/miniconda3/envs/exo/lib/python3.12/site-packages/grpc/aio/_call.py", 
line 318, in __await__
    raise _create_rpc_error(
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Unexpected <class 'tinygrad.device.CompileError'>: OpenCL 
Compile Error

<source>:1:38: error: unknown type name '__bf16'
__kernel void E_131072_32_4(__global __bf16* data0) {
                                     ^

<source>:5:20: error: use of undeclared identifier '__bf16'
  data0[alu0+1] = (__bf16)(0.0);
                   ^

<source>:5:28: warning: double precision constant requires cl_khr_fp64, casting 
to single precision
  data0[alu0+1] = (__bf16)(0.0);
                           ^

<source>:6:20: error: use of undeclared identifier '__bf16'
  data0[alu0+2] = (__bf16)(0.0);
                   ^

<source>:6:28: warning: double precision constant requires cl_khr_fp64, casting 
to single precision
  data0[alu0+2] = (__bf16)(0.0);
                           ^

<source>:7:20: error: use of undeclared identifier '__bf16'
  data0[alu0+3] = (__bf16)(0.0);
                   ^

<source>:7:28: warning: double precision constant requires cl_khr_fp64, casting 
to single precision
  data0[alu0+3] = (__bf16)(0.0);
                           ^

<source>:8:18: error: use of undeclared identifier '__bf16'
  data0[alu0] = (__bf16)(0.0);
                 ^

<source>:8:26: warning: double precision constant requires cl_khr_fp64, casting 
to single precision
  data0[alu0] = (__bf16)(0.0);
                         ^

error: Compiler frontend failed (error code 62)
"
        debug_error_string = "UNKNOWN:Error received from peer  
{grpc_message:"Unexpected <class \'tinygrad.device.CompileError\'>: OpenCL 
Compile Error\n\n<source>:1:38: error: unknown type name \'__bf16\'\n__kernel 
void E_131072_32_4(__global __bf16* data0) {\n                                  
^\n\n<source>:5:20: error: use of undeclared identifier \'__bf16\'\n  
data0[alu0+1] = (__bf16)(0.0);\n                   ^\n\n<source>:5:28: warning: 
double precision constant requires cl_khr_fp64, casting to single precision\n  
data0[alu0+1] = (__bf16)(0.0);\n                           ^\n\n<source>:6:20: 
error: use of undeclared identifier \'__bf16\'\n  data0[alu0+2] = 
(__bf16)(0.0);\n                   ^\n\n<source>:6:28: warning: double precision
constant requires cl_khr_fp64, casting to single precision\n  data0[alu0+2] = 
(__bf16)(0.0);\n                           ^\n\n<source>:7:20: error: use of 
undeclared identifier \'__bf16\'\n  data0[alu0+3] = (__bf16)(0.0);\n            
^\n\n<source>:7:28: warning: double precision constant requires cl_khr_fp64, 
casting to single precision\n  data0[alu0+3] = (__bf16)(0.0);\n                 
^\n\n<source>:8:18: error: use of undeclared identifier \'__bf16\'\n  
data0[alu0] = (__bf16)(0.0);\n                 ^\n\n<source>:8:26: warning: 
double precision constant requires cl_khr_fp64, casting to single precision\n  
data0[alu0] = (__bf16)(0.0);\n                         ^\n\nerror: Compiler 
frontend failed (error code 62)\n", grpc_status:2, 
created_time:"2024-08-14T10:50:09.810393599+08:00"}"
>
Received SendOpaqueStatus request: 
request_id='cb27eced-cd1c-48d8-9ef9-0f4e680346fe' status='{"type": 
"node_status", "node_id": "319cbd94-148d-4767-80af-950aa5c20-23", "status": 
"start_process_prompt", "base_shard": {"model_id": 
"/nasroot/models/Meta-Llama-3-8B/", "start_layer": 0, "end_layer": 1, 
"n_layers": 32}, "shard": {"model_id": "/nasroot/models/Meta-Llama-3-8B/", 
"start_layer": 0, "end_layer": 1, "n_layers": 32}, "prompt": 
"<|im_start|>user\\nWhat is the meaning of 
exo?<|im_end|>\\n<|im_start|>assistant\\n", "image_str": "", "inference_state": 
null, "request_id": "cb27eced-cd1c-48d8-9ef9-0f4e680346fe"}'
Preemptively starting download for 
Shard(model_id='/nasroot/models/Meta-Llama-3-8B/', start_layer=29, end_layer=31,
n_layers=32)
Preemptively starting download for 
Shard(model_id='/nasroot/models/Meta-Llama-3-8B/', start_layer=29, end_layer=31,
n_layers=32)

I'm not sure if this is a tinygrad issue. Could updating to the latest tinygrad version solve this problem? My device is not connected to the internet, so all operations are copied and executed on the node. @AlexCheema