I modified the sparse attribute in FasterTransformer/examples/cpp/multi_gpu_gpt. I was under the impression that it would accelerate the inference speed. But it turns out to be slower than the dense alternative regardless of the batch size. It also consumes much more memory than the dense option, which is counterintuitive. Is there any explanation on such behavior? Thanks
I modified the
sparse
attribute in FasterTransformer/examples/cpp/multi_gpu_gpt. I was under the impression that it would accelerate the inference speed. But it turns out to be slower than the dense alternative regardless of the batch size. It also consumes much more memory than the dense option, which is counterintuitive. Is there any explanation on such behavior? Thanks