Closed lishicheng1996 closed 2 months ago
@Tracin Could you please have a look? Thanks
@lishicheng1996 Basiclly, this is due to less efficient of sparse kernel compared with dense one, so they are not chosen. Can you verify it on H100? I do not see any plan about FP8 sparsity on 4090.
@lishicheng1996 Basiclly, this is due to less efficient of sparse kernel compared with dense one, so they are not chosen. Can you verify it on H100? I do not see any plan about FP8 sparsity on 4090.
Thank you!
Hi! I tried Sparsity fp8 Llama-3-8b on RTX4090, but doesn't get performance improvement. I checked the trt-llm build log, which shows that depite there are layers eligible to use sparse tactics, they are not chosen.
I see sparsity example on H100 in the benchmark. I'm wodering why it doesn't work on 4090. And may I ask is there 4090 support plan in roadmap?
Thanks!