Open JohnLee1360 opened 1 day ago
Can you try running with SUPPORT_BF16=0
Thanks for the reply! I assume that SUPPORT_BF16=0
means smaller quantized weights accuracy, right?
I gave it a shot, but it doesn't work. Maybe the problem stems from the different architecture of chips?
I am trying to use two MacBooks to run a Llama 8B model, but I can't load weights from the model to inference and stuck in 0 progress...
Here's info of my equipments: node1: MacBook Air 16GB with M3 chip node2: MacBook Pro 16GB with M1 chip(Intel based)
Since my machine resource is limited, all of my machines run in tinygrad inference engine rather than in MLX. And I also wonder why my MacBook Pro shows 0TFLOPS?
I will be so appreciative if someone can offer me help~