I'm attempting to run the Starcoder model on a Mac M2 with 32GB of memory using the Transformers library in a CPU environment. Despite setting load_in_8bit=True, I'm encountering an error during execution. Below is the relevant code:
While running the above, I receive the following warning and exception:
Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8.
Warning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
Error:
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
`device_map` to `from_pretrained`.
Hi, I believe you'll need a GPU to quantize your model. If you're using a cpu, you might not want to set load_in_8bit=True. Please refer to this part of the documentation for further details.
I'm attempting to run the Starcoder model on a Mac M2 with 32GB of memory using the Transformers library in a CPU environment. Despite setting load_in_8bit=True, I'm encountering an error during execution. Below is the relevant code:
While running the above, I receive the following warning and exception: