OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.44k stars 305 forks source link

Inference failed with "axis 2 has dimension xxxx but expected yyyy" error #1769

Open GangLiCN opened 3 months ago

GangLiCN commented 3 months ago

I tried to use ctranslate2 as the inference framework to do model inference, but failed with error as below: "axis 2 has dimension 8192 but expected 7680"

What I've done:

  1. First I must convert the model to CT2 model, but due to big model size, I used Quantify parameter to reduce model file's size: converter.convert(output_dir, quantization="int8",force=True)

  2. Then, Load the quantified model and do inference, unfortunately I hitted below error: "axis 2 has dimension 8192 but expected 7680" error

How to fix it ?

Inference code snippet is as below: ` try:

加载量化后的模型作为 Generator

generator = ctranslate2.Generator("gemma-2-9b-it-ct2", device="cpu")

# 准备输入
input_text = "Translate this to French: Hello, world!"
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(input_text))

# 使用 generate_batch 方法进行推理
results = generator.generate_batch([tokens], max_length=50, sampling_topk=1)

# 解码并打印结果
for result in results:
    output_tokens = result.sequences[0]
    output_text = tokenizer.decode(tokenizer.convert_tokens_to_ids(output_tokens))
    print(f"Input: {input_text}")
    print(f"Output: {output_text}")

except Exception as e: print(f"Error during model loading or inference: {e}")`

GangLiCN commented 3 months ago

Tried another model ("gemma-2-2b-it") and hit the same error.

Below output comes from Claude-3.5-sonnet's response:

So, This error message suggests that the problem may not be limited to large models, but rather an overall compatibility issue between CTranslate2 and the Gemma model series. Let's analyze this in depth: Dimension mismatch problem:

The error message "axis 2 has dimension 4096 but expected 4352" indicates that the model structure expected by CTranslate2 is inconsistent with the actual Gemma model structure. This may be due to Gemma models using some special architectural features that CTranslate2 has not fully adapted to yet.

CTranslate2's support for Gemma: It's likely that CTranslate2 doesn't fully support the Gemma model series yet. Gemma is a relatively new model and may use some features that CTranslate2 hasn't adapted to yet.

Solutions and suggestions:

  1. Check CTranslate2 version and documentation:

Ensure you're using the latest version of CTranslate2 and carefully review its documentation for any special instructions or known issues regarding Gemma models.

  1. Report the issue to CTranslate2 developers:

Create an issue on CTranslate2's GitHub repository, detailing the problems you've encountered, including the model version used and error messages. This may prompt developers to add support for Gemma models.

  1. Explore alternative optimization methods:

Consider using other optimization frameworks such as ONNX Runtime or TensorRT. These frameworks may have better support for Gemma models.

  1. Custom conversion script:

If you're familiar with model structures and PyTorch, consider writing a custom script to convert the model, ensuring correct dimension matching. This requires a deep understanding of Gemma model architecture and CTranslate2's expected input.

  1. Use native PyTorch optimization:

If other methods are not feasible, consider using PyTorch's native optimization techniques such as torch.compile(), quantization, or model pruning.

  1. Monitor CTranslate2 updates:

Keep a close eye on CTranslate2 updates, as they may add full support for Gemma models in future versions.

  1. Community support: Inquire in relevant technical forums or communities (such as Hugging Face forums) whether other users have successfully used CTranslate2 to optimize Gemma models; they may have some unique solutions.

8 Contact Google or Gemma model maintainers directly:

Consider contacting the Gemma model development team to inquire if they have recommended optimization methods or known compatibility issues with CTranslate2.

  1. Summary:

    It appears to be an overall compatibility issue between CTranslate2 and the Gemma model series, not just a problem with large or small models. Before CTranslate2 fully supports Gemma, it may be necessary to explore other optimization methods or temporarily use the original PyTorch implementation.

    If you decide to continue exploring this issue or try other optimization methods, I'd be happy to provide more specific technical advice. Please feel free to share your thoughts or next steps.

shiroi-bara commented 3 months ago

Sadly Gemma-2 model line is not officially supported yet by Ctranslate2. You can try with other supported model. Checking python/ctranslate2/converters/transformers.py file, I would say the closest in performance to gemma-2 would be either phi-3 or llama-3.1 lines.