bigcode-project / starcoder

Home of StarCoder: fine-tuning & inference!
Apache License 2.0
7.26k stars 516 forks source link

Issue with running Starcoder Model on Mac M2 with Transformers library in CPU environment #134

Open code2graph opened 1 year ago

code2graph commented 1 year ago

I'm attempting to run the Starcoder model on a Mac M2 with 32GB of memory using the Transformers library in a CPU environment. Despite setting load_in_8bit=True, I'm encountering an error during execution. Below is the relevant code:

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "bigcode/starcoder"
device = "cpu"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint,
                                             device_map="auto",
                                             load_in_8bit=True)
print(f"Memory footprint: {model.get_memory_footprint() / 1e6:.2f} MB")

inputs = tokenizer.encode("def print_hello_world():", return_tensors="pt").to(device)
outputs = model.generate(inputs)

print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))

While running the above, I receive the following warning and exception:

Intel MKL WARNING: Support of Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) enabled only processors has been deprecated.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8.
Warning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.

Error:
ValueError:
                        Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
                        the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
                        these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom
                        `device_map` to `from_pretrained`. 
ArmelRandy commented 1 year ago

Hi, I believe you'll need a GPU to quantize your model. If you're using a cpu, you might not want to set load_in_8bit=True. Please refer to this part of the documentation for further details.