In nvidia-ammo, it appears these lines in ammo/torch/export/layer_utils.py have an unexpected failure for some Llama variants:
In particular, the deepseek models use LlamaLinearScalingRotaryEmbedding. This means the module is picked up by the is_linear check, and is treated as the dense case. However, there is no .weight for this module, so the build_linear_config fails.
Lots of easy fixes for this (for example, just checking if "Rotary" in name and skipping that case), happy to contribute (but don't think there is an OSS repo to do so)
In nvidia-ammo, it appears these lines in
ammo/torch/export/layer_utils.py
have an unexpected failure for some Llama variants:In particular, the deepseek models use
LlamaLinearScalingRotaryEmbedding
. This means the module is picked up by theis_linear
check, and is treated as the dense case. However, there is no.weight
for this module, so thebuild_linear_config
fails.Lots of easy fixes for this (for example, just checking if "Rotary" in name and skipping that case), happy to contribute (but don't think there is an OSS repo to do so)