Lightning-AI / litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
https://lightning.ai
Apache License 2.0
10.47k stars 1.04k forks source link

Tensor parallelism generates non-sensical outputs #1663

Open rasbt opened 2 months ago

rasbt commented 2 months ago

Bug description

For some reason, the tensor parallel implementation generates non-sensical outputs

⚡ python-api-tensor-parallel ~/litgpt litgpt generate_tp checkpoints/microsoft/phi-2 
...
Instruct: What food do llamas eat?
Output: When the
.

The first

.

The first

.

Time for inference 1: 1.31 sec total, 15.23 tokens/sec

Expected output (e.g., via base or sequential generation):

Instruct: What food do llamas eat?
Output: Llamas eat grass, shrubs, and other vegetation.

What operating system are you using?

Linux

LitGPT Version

Current main branch

rasbt commented 2 months ago

It seems to be related to the MLP class:

Has problem:

Is fine:

It could be that this could automatically get fixed via #1421