BFloat16 is not supported on MPS

Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

https://lightning.ai

Apache License 2.0

6.95k stars 733 forks source link

BFloat16 is not supported on MPS #498

Closed darebfh closed 8 months ago

darebfh commented 9 months ago

I followed the set-up guide to infer using the stablelm-base-alpha-3b.

Running

python scripts/download.py --repo_id stabilityai/stablelm-base-alpha-3b

python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/stabilityai/stablelm-base-alpha-3b

works as a charm.

However, upon trying to run the model with python generate/base.py --prompt "Hello, my name is" I get

TypeError: BFloat16 is not supported on MPS

Obviously, I am working on a Apple M1 Max. Going through the tutorials, I did not find any additional requirements regarding running lit-gpt on Apple Silicon.

rasbt commented 9 months ago

Hi there,

you could try using

python generate/base.py --prompt "Hello, my name is" --precision 16-true

(I think the default is --precision bf16-true, which is maybe why you are getting this error). Let me know if this works; we should probably update the code or documentation accordingly then.

Andrei-Aksionov commented 9 months ago

Oops, my bad 😊 I'll shortly update the code that defines the default precision.

darebfh commented 9 months ago

Hey, thanks for the quick reply!

Unfortunately, there's more stuff not implemented (yet?) on MPS. Upon running python generate/base.py --prompt "Hello, my name is" --precision 16-true I get the following error:

NotImplementedError: The operator 'aten::index_copy.out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

carmocca commented 9 months ago

@darebfh Can you try rewriting https://github.com/Lightning-AI/lit-gpt/blob/main/lit_gpt/model.py#L237-L238 to do index_copy( instead? (no underscore)

If this doesn't work, you can also try setting PYTORCH_ENABLE_MPS_FALLBACK=1 as the error message suggests

Andrei-Aksionov commented 9 months ago

I tried on my mps device with Radeon GPU and index_copy didn't work:

device = "mps"
x = torch.zeros(5, 3, device=device)
t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float, device=device)
index = torch.tensor([0, 4, 2], device=device)
x.index_copy(0, index, t)

>> The operator 'aten::index_copy.out' is not currently implemented for the MPS device. ...

Surprisingly, PYTORCH_ENABLE_MPS_FALLBACK=1 also didn't work. Curious, will it work on Apple Silicon.

carmocca commented 9 months ago

Oh that's unfortunate. Does x[index] = t work?

Andrei-Aksionov commented 9 months ago

Oh that's unfortunate. Does x[index] = t work?

Yep, it works. Previously there was an issue on mps devices with index copying, but not anymore: https://github.com/pytorch/pytorch/issues/101936

darebfh commented 8 months ago

Surprisingly, PYTORCH_ENABLE_MPS_FALLBACK=1 also didn't work. Curious, will it work on Apple Silicon.

Yes it works, with 4,4 tokens/sec.

Andrei-Aksionov commented 8 months ago

Since the issue with automatically applying bf16 precision for MPS device is fixed in one of the latest commits, I think we can close the issue.

PYTORCH_ENABLE_MPS_FALLBACK=1 env variable is more like a fix for functions that are not yet supported, so it's a temporary item. I hesitate to even add info about it into README, since:

in coming months it might not be an issue
users can see the hint about applying this env variable in the error output anyway

carmocca commented 8 months ago

I agree. Thanks Andrei!