Closed sgsdxzy closed 3 weeks ago
Previous the weight shapes of ColumnParallelLinear are incorrect for q,k,v layers of some models, because the output size is divided by tp size twice.
ColumnParallelLinear
Previous the weight shapes of
ColumnParallelLinear
are incorrect for q,k,v layers of some models, because the output size is divided by tp size twice.