a_sh_rd_delta_o - Githubissues

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Apache License 2.0

575 stars 45 forks source link

Open Lenan22 opened 5 months ago

Lenan22 commented 5 months ago

constexpr int a_sh_rd_delta_o = 2 * ((threads / 32) / (thread_n_blocks / 4));