Open joerong666 opened 3 months ago
Hi there,I met the same problem. It works for me to build pytorch from source (cuda 12.4). You may have a try~
Hi there,I met the same problem. It works for me to build pytorch from source (cuda 12.4). You may have a try~
Hi,have you met this error with cuda_graph?#1948
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Hi @hijkzzz,I meet the same problem for MOE (including 8x22B and 8x7B) fp8 quantization in H20, even after upgrading to 0.11.0.dev2024060400 (neither in v0.12.0.dev2024070900). However, it is fine for llama-70B fp8 quantization. Here is my environment:
Here is my quantization command:
Any suggestion is appreciated !
Below is my detail error:
Originally posted by @joerong666 in https://github.com/NVIDIA/TensorRT-LLM/issues/1645#issuecomment-2235833820