deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself
https://coder.deepseek.com/
MIT License
6.01k stars 433 forks source link

TensorRT Quantization Breaks for `LlamaLinearScalingRotaryEmbedding` #117

Open Sanger2000 opened 4 months ago

Sanger2000 commented 4 months ago

In nvidia-ammo, it appears these lines in ammo/torch/export/layer_utils.py have an unexpected failure for some Llama variants:

Screen Shot 2024-02-10 at 11 12 23 PM

In particular, the deepseek models use LlamaLinearScalingRotaryEmbedding. This means the module is picked up by the is_linear check, and is treated as the dense case. However, there is no .weight for this module, so the build_linear_config fails.

Lots of easy fixes for this (for example, just checking if "Rotary" in name and skipping that case), happy to contribute (but don't think there is an OSS repo to do so)