Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
Apache License 2.0
1.2k stars 80 forks source link

Incompatibility with HF Model Qwen2-1.5B - Tensor Indexing Error (1-D vs 2-D) #1135

Open mjmikulski opened 2 months ago

mjmikulski commented 2 months ago

🚀 Model / language coverage

I encountered an issue while attempting to use thunder.jit with models outside of the lit-gpt universe, specifically the Hugging Face model Qwen2-1.5B-Instruct. The following error is thrown:

RuntimeError: Advanced indexing currently only supports zero or one-dimensional integer tensors, but found a tensor with dtype int64 and 2 dimensions.

The shape of the tensor in question is actually (1, 1024), which could potentially be handled with squeeze().

Pitch

Supporting this case could enable compatibility with Qwen2-1.5B-Instruct and possibly with other models from Qwen family.

Alternatives / Potential work-arounds

Adding the following code to the function _advanced_indexing in thunder/clang/__init__.py resolves the issue temporarily:

if isinstance(x, TensorLike):
    dims_to_squeeze = tuple([i for i, d in enumerate(x.shape) if d == 1])
    if len(dims_to_squeeze) > 0:
        x = prims.squeeze(x, dims_to_squeeze)

However, the same issue re-emerges in the prims.take_meta function:

RuntimeError: Expected index to be a 1-D or 0-D tensor, but index.ndim=2!

Minimal Repro

from transformers import AutoModelForCausalLM, AutoTokenizer
import thunder
import torch

# Define device and model
DEVICE = torch.device('cuda', 0)
model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2-1.5B-Instruct', torch_dtype=torch.bfloat16, device_map="cuda")

# Compilation with thunder.jit
model = thunder.jit(model)

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")

# Sample input with shape (1, 1024)
inputs = tokenizer("Hello world!", return_tensors="pt").input_ids.to(DEVICE)
inputs = inputs.repeat(1, 1024 // inputs.shape[1])

# Forward pass
output = model(inputs)

results in

  File "/home/mmikulski/miniconda3/envs/report2/lib/python3.10/site-packages/thunder/core/prims.py", line 2882, in take_meta
    utils.check(index.ndim <= 1, lambda: f"Expected index to a 1-D or 0-D tensor, but index.ndim={index.ndim}!")
  File "/home/mmikulski/miniconda3/envs/report2/lib/python3.10/site-packages/thunder/core/baseutils.py", line 103, in check
    raise exception_type(s())
RuntimeError: Expected index to a 1-D or 0-D tensor, but index.ndim=2!
mjmikulski commented 2 months ago

Additional info

(as requested by @tfogal)

If the line model = thunder.jit(model) in the above repro script is replaced with the following lines:

from thunder.dynamo import ThunderCompiler
executors = list(thunder.get_default_executors())
backend = ThunderCompiler(executors=executors)
model = torch._dynamo.optimize(backend=backend)(model)

The same error occurs:

RuntimeError: Advanced indexing currently only supports zero or one-dimensional integer tensors, but found a tensor with dtype thunder.dtypes.int64 and 2 dimensions

Additional scope

I also checked a few different models from Qwen family and all failed:

# Qwen 2
Qwen/Qwen2-1.5B-Instruct (originally reported) - fail
Qwen/Qwen2-0.5B-Instruct - fail
Qwen/Qwen2-7B - fail

# Qwen 1.5
Qwen/Qwen1.5-4B - fail

# Qwen 1
Qwen/Qwen-1.8B-Chat - fail (but different error connected to DispatchKeySet)

So fixing this issue has potential to enable qwen 1.5 and qwen 2 models to work with thunder.

nvMelissa commented 2 months ago

Triage review 9/16/24: Problem acknowledged. We welcome the community to update the meta function to solve the problem.

IvanYashchuk commented 1 week ago

@riccardofelluga, could you please take this one?