InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.58k stars 418 forks source link

[Bug] TypeError: Got unsupported ScalarType BFloat16 #2453

Closed SeitaroShinagawa closed 1 month ago

SeitaroShinagawa commented 1 month ago

Checklist

Describe the bug

I got an error message of TypeError.

Reproduction

I executed the following script in ipython. Note: This is from "An example to cauculate logits & ppl"

from transformers import AutoTokenizer
from lmdeploy import pipeline
model_repoid_or_path='internlm/internlm2_5-7b-chat'
pipe = pipeline(model_repoid_or_path)
tokenizer = AutoTokenizer.from_pretrained(model_repoid_or_path, trust_remote_code=True)

# logits
messages = [
   {"role": "user", "content": "Hello, how are you?"},
]
input_ids = tokenizer.apply_chat_template(messages)
logits = pipe.get_logits(input_ids)

# ppl
ppl = pipe.get_ppl(input_ids)

Environment

Python: 3.10.12
LMDeploy: 0.6.0a0+edcdd8e (installed by `pip install -e .`)

Error traceback

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[1], line 15
     12 logits = pipe.get_logits(input_ids)
     14 # ppl
---> 15 ppl = pipe.get_ppl(input_ids)

File /home/lmdeploy/lmdeploy/serve/utils.py:250, in LogitsMixin.get_ppl(self, input_ids)
    248 loss_sum = torch.sum(all_loss_matrix * all_target_mask, dim=1)
    249 loss_avg = loss_sum / target_count
--> 250 loss_avg = loss_avg.cpu().numpy()
    251 return loss_avg

TypeError: Got unsupported ScalarType BFloat16

I confirmed this trouble was fixed by replacing loss_avg = loss_avg.cpu().numpy() with loss_avg = loss_avg.cpu().float().numpy()

I think it's just a small fix. I can create a PR if it would be preferable for the maintainers.

irexyc commented 1 month ago

It would be great if you could submit a PR to fix it.

The problem may because you used pytorch backend and it return bfloat16 tensor (turbomind backend will output float32 tensor). And I think it would be better to change this line to _logits = _logits.float().cpu()

SeitaroShinagawa commented 1 month ago

Thank you for your suggestion. I followed it and created the PR.