is there a reason why gemm stores the tensors reversed (in, out) vs (out, in) ?

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

https://casper-hansen.github.io/AutoAWQ/

MIT License

1.59k stars 186 forks source link

Closed vince62s closed 9 months ago

vince62s commented 9 months ago

it makes things difficult when we want to handle tensor parallelism .....

casper-hansen commented 9 months ago

This is just how the GEMM kernel was implemented. I suspect it made it easier to implement.