Closed lynkz-matt-psaltis closed 1 year ago
Hi @lynkz-matt-psaltis ,
Thanks for reporting that issue. I'm going to forward the issue to the AMMO team. I'll let you know when they have some feedback about it.
Thanks, Julien
Hi @lynkz-matt-psaltis , this issue happens when the pre-compiled cuda extension is not compatible with the host CUDA/torch versions.
In your case, we suggest you find the source wheel files in the ammo tarball (those without cuxxx in the wheel name), which does compilation on the fly. Also see: https://github.com/NVIDIA/TensorRT-LLM/issues/126
Awesome thanks so much for that team! @RalphMao & @jdemouth-nvidia
When attempting to quantize a Phind CodeLlama model I receive an exception: AttributeError: 'NoneType' object has no attribute 'fake_tensor_quant_with_axis'
Using ammo 3.0. TensorRT-LLM compiled from main branch.