Open zkf331 opened 5 months ago
I am attempting to perform W8A8 quantization using the int8FusedDequantizeCUDA operator, but the inference results are NaN. The code is as follows:
Modifications in qlinear.py:
qint_x = shared_input.qint_x # qint_x shape: [M , K] int_weight = self.int_weight # int_weight shape: [N, K] scale_row = shared_input.meta[None, 0::2].contiguous() # scale_row shape: [1, M] zero_row = shared_input.meta[None, 1::2].contiguous() # zero_row shape: [1, M] weights_scales = self.weights_scales.transpose(0, 1) # weights_scales: [1, N] reduced_w = self.reduced_w # reduced_w: [1, N] shift_value = 128.0 output = quik.asymmetric.int8FusedDequantize( qint_x, int_weight, scale_row, weights_scales, shift_value, zero_row, reduced_w, fp_result)
Is there an issue with the operator itself or am I using it incorrectly? Could you please provide some suggestions? Thank you very much.
I am attempting to perform W8A8 quantization using the int8FusedDequantizeCUDA operator, but the inference results are NaN. The code is as follows:
Modifications in qlinear.py:
Is there an issue with the operator itself or am I using it incorrectly? Could you please provide some suggestions? Thank you very much.