NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.85k stars 891 forks source link

cuSPARSELt is slower? #767

Open BDHU opened 1 year ago

BDHU commented 1 year ago

I modified the sparse attribute in FasterTransformer/examples/cpp/multi_gpu_gpt. I was under the impression that it would accelerate the inference speed. But it turns out to be slower than the dense alternative regardless of the batch size. It also consumes much more memory than the dense option, which is counterintuitive. Is there any explanation on such behavior? Thanks

YixinSong-e commented 1 year ago

Hello, did you try flash-llm?