Closed XapaJIaMnu closed 2 years ago
Can you test it first?
Please treat my review as a no-objection. Here's the diff which gets this working locally on my WASM page.
diff --git a/src/tensors/cpu/wasm_intgemm_fallback.cpp b/src/tensors/cpu/wasm_intgemm_fallback.cpp
index ec7be368..ffc159f5 100644
--- a/src/tensors/cpu/wasm_intgemm_fallback.cpp
+++ b/src/tensors/cpu/wasm_intgemm_fallback.cpp
@@ -44,7 +44,12 @@ extern "C" void int8PrepareBFromTransposedFallback(const float* input_B_transpos
Index width,
Index cols_B,
int8_t* output) {
- ABORT("Unimplemented int8PrepareBFromTransposedFallback");
+ intgemm::Int8::PrepareBTransposed(input_B_transposed,
+ output,
+ scale,
+ width,
+ cols_B);
+
}
This appears to be (width, colsB) rather than (colsB, width), which I'm using currently at https://github.com/jerinphilip/arm-playground/pull/12, but it's on me to fix that I suppose...
I think it should be working now.
@XapaJIaMnu I am testing it now. Btw, how can I make sure that it is the transposed code path that I am testing?
@XapaJIaMnu I am testing it now. Btw, how can I make sure that it is the transposed code path that I am testing?
If you are using this branch, i've removed the hacks that call tranpose and then prepareB when necessary. You can also check which function is being executed with a simple print statement if you doubt it.
@XapaJIaMnu I am testing it now. Btw, how can I make sure that it is the transposed code path that I am testing?
If you are using this branch, i've removed the hacks that call tranpose and then prepareB when necessary. You can also check which function is being executed with a simple print statement if you doubt it.
I see only the transposed code path being taken and only prepareBTransposed being called in the logs. The wasm test page works on my end.
Out of curiosity, I don't see prepareB being called at all now. Is it a dead path for most of the trained models? Do you know of any model that might invoke prepareB path as well?
PrepareB is called during model loading time, not during translation.
PrepareB is called during model loading time, not during translation.
But I don't see it being called ever. Model loading is always invoking PrepareBQuantizedTransposed
and inference is invoking only PrepareBTransposed
(although just once) at my end.
I see. If you submit a .npz file and tell it to be transformed into intgemm on the fly you will get PrepareB called, but I guess we don't do that as we only have binary formats.
Apologies for the delay @jerinphilip , @abhi-agg . I have implemented the prepareBtransposed codepath and tested it for the non-wasm codepath. Could you test it for the wasm codepath if you have it set up somewhere?
Thanks
Nick
Checklist