Closed abhi-agg closed 3 years ago
@kpu @XapaJIaMnu I have made this change for 1 function (int8PrepareA) of matrix multiply interface initially to have an early feedback from you. I can do similar changes for rest of the functions of this interface after your feedback.
For int8Prepare, I have made the following changes (based on my understanding):
intgemm::Int8Shift::PrepareA
and intgemm::Int8::PrepareA
calls are routed to int8PrepareA for wasm target.0.0
is being passed as zero point
currently.Please let me know if this needs to change and I will make the required changes π
This looks like a bunch of WASM binding code. I'm not sure how to review it.
One comment is the fallback or not should be atomic for the entirety of the API. Mixing a native prepareA with a fallback WASM 128 bit multiply won't work because they can have different opaque representations.
Perhaps use the existing CPU model dispatch mechanism in intgemm.cc
for your polyfill? Just sense the added functions then treat it like a CPU model.
Also, the fallback should be easy to implement by calling the existing function rather than an abort.
This looks like a bunch of WASM binding code. I'm not sure how to review it.
That's fine. I guess you can ignore that part. I will add @yurydelendik and @eqrion to review that part of the code.
One comment is the fallback or not should be atomic for the entirety of the API. Mixing a native prepareA with a fallback WASM 128 bit multiply won't work because they can have different opaque representations.
I didn't follow you here. It is a bit unclear to me where I am mixing the native prepareA with a fallback WASM 128 bit multiply. Could you please point out? That would be helpful.
As I already mentioned, I made this change just for 1 function (int8PrepareA) of matrix multiply interface to have an early feedback from you. I can do similar changes for rest of the functions of this interface after your feedback.
So, either the native implementation will be called for the entire wasm matrix multiplication interface or the fallback implementation will be called.
Using the pre-processor directive, I separated the code for wasm target from non-wasm targets for matrix multiplication. The non-wasm targets are still unchanged (i.e. use intgemm directly) while for the wasm target, I am calling the new matrix multiply interface now. Specifically, this is the part of the code where I am looking for your feedback π
I think Marian is the wrong layer of abstraction for this. Why not just hide it behind the dispatcher in intgemm, treating the presence of JS intrinsics as a CPU model?
https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.cc
That way you get a matrix multiply library that happens to run faster if there's a JS API. And MXNet will just happen to get a faster matrix multiply too.
PrepareA just quantizes so it's compatible across architectures anyway. But e.g. if you had a native PrepareB with CPUID dispatch that picked up 256-bit AVX2 and a WASM Multiply that does 128-bit you'll get wrong results. The function-level polyfill design you have here carries that risk whereas something that checks for all functions to decide what to use does not carry that risk. That said, I presume Firefox would implement all-or-nothing, so the risk is low.
I think Marian is the wrong layer of abstraction for this. Why not just hide it behind the dispatcher in intgemm, treating the presence of JS intrinsics as a CPU model?
https://github.com/kpu/intgemm/blob/master/intgemm/intgemm.cc
That way you get a matrix multiply library that happens to run faster if there's a JS API. And MXNet will just happen to get a faster matrix multiply too.
I am confused now. If I hide everything behind dispatcher in intgemm then that would mean I can just throw away the gemm interface that we recently designed. There are still some subtle differences b/w intgemm interface and the one that we recently designed (callbacks is one of them). With these changes in place, it is difficult to abstract at intgemm level.
Additionally, Marian can link and work with different gemm libraries (intgemm, fbgemm). Wouldn't marian be the right place to link and use another gemm library that runs efficiently on wasm target? This also keeps a possibility to use different backend implementations if one doesn't want to link to intgemm at all.
PrepareA just quantizes so it's compatible across architectures anyway. But e.g. if you had a native PrepareB with CPUID dispatch that picked up 256-bit AVX2 and a WASM Multiply that does 128-bit you'll get wrong results. The function-level polyfill design you have here carries that risk whereas something that checks for all functions to decide what to use does not carry that risk. That said, I presume Firefox would implement all-or-nothing, so the risk is low.
As you also mentioned, Firefox will implement all-or-nothing (as all of us also agreed in the beginning that implementing only a subset of the gemm functions wouldn't give us the performance gain that we want). In this scenario, we will not have the risk that you mentioned.
On a side note, I have updated this PR to refactor marian code for the rest of the functions of gemm interface as well.
If I hide everything behind dispatcher in intgemm then that would mean I can just throw away the gemm interface that we recently designed. There are still some subtle differences b/w intgemm interface and the one that we recently designed (callbacks is one of them). With these changes in place, it is difficult to abstract at intgemm level.
Yes, exactly. Now you understand why I was protesting an unnecessary redesign of the interface. If you want an intgemm interface you can have it. If you want a matrix multiply interface that's standards-ready for others to use, implement WebNN. The middle does not serve much purpose.
Sorry, for chiming in late. I don't fully understand lots of concerns mentioned above, probably because different terminology is used.
I think Marian is the wrong layer of abstraction for this.
Maybe or maybe not. The ideal location for fallback implementation is in separate wasm module and in a completely new project, you have to start and maintain as dependency. Keeping fallback logic in the Marian for first iteration will also save you some time, you would spend on dealing of wasm memory instantiation when multiple wasm modules are used.
Now you understand why I was protesting an unnecessary redesign of the interface.
I don't think we are redesigning an interface. We are building the interface suitable for wasm intrinsics. You will continue to use existing interface outside the wasm target.
If I hide everything behind dispatcher in intgemm then that would mean I can just throw away the gemm interface that we recently designed. There are still some subtle differences b/w intgemm interface and the one that we recently designed (callbacks is one of them). With these changes in place, it is difficult to abstract at intgemm level.
Yes, exactly. Now you understand why I was protesting an unnecessary redesign of the interface. If you want an intgemm interface you can have it. If you want a matrix multiply interface that's standards-ready for others to use, implement WebNN. The middle does not serve much purpose.
As I have previously stated few times, the current API is not a redesign of intgemm but an interface that is suitable for intrinsics. This mainly involved changing the intgemm's callbacks/function-pointers style to using arguments. We had to do it because there is no good way now to express function pointer (@yurydelendik can confirm this).
Additionally, Marian can link and work with different gemm libraries (intgemm, fbgemm). Wouldn't marian be the right place to link and use another gemm library that runs efficiently on wasm target? This also keeps a possibility to use different backend implementations if one doesn't want to link to intgemm at all.
There are currently two layers of dispatch: Marian and CPUID in intgemm.
In Marian, you manually configure which fbgemm or fbgemm library you want the Options object and they have peculiar behavior like whether they handle the output matrix. There is currently no automatic dispatch mechanism that switches between the two.
In intgemm it favors CPUID dispatch but one can call the routines directly with the class for that CPUID. The templates would admittedly try to force the existence of functions you haven't implemented (and don't need for our particular case).
This appears to be inserting a third layer in between with dispatch controlled by a compiler predefine. Is there any means of sensing browser support or will it always be two different builds? If yes, is WASM
the correct define for this switch?
To give a different example, with the current wormhole instruction, I could (but have not gotten around to) run the wormhole instruction, inspect the output, and then use that as a CPUID of sorts to dispatch different intgemm implementations. Is there an analogous way to sense the presence of an optimized implementation here or is it always two different builds?
Hi @kpu, we think we should have a meeting to clarify these issues. I'm sending your an email to have it scheduled. Thanks
@kpu @XapaJIaMnu This PR is up for review. I made all the changes as per our meeting and updated the description of this PR to summarize those changes π
@kpu @XapaJIaMnu Could you folks please review the PR?
@abhi-agg We good? I think the changes should be quick.
There are places where code was copied and it looks like a narrower ifdef would have avoided copying code.
@kpu Actually, I used narrower ifdef in the begining but it caused compilation failure on Windows platform. So I had to do it this way.
Can you be more specific about the problem with windows? Something to do wit unused typedef?
#if defined(WASM)
return {NodeOp(
quantMult_ = *child(1)->val()->data();
// snip
#else
typedef typename intgemm_<vtype>::type Integer;
return {NodeOp(
quantMult_ = *child(1)->val()->data();
typedef typename intgemm_<vtype>::type Integer;
Can you be more specific about the problem with windows? Something to do wit unused typedef?
The error is not related to unused typedef. It is following:
error C2121: '#': invalid character: possibly the result of a macro expansion
I found the explanation here: https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2121?view=msvc-160
So, it is with using a preprocessor inside the NodeOp
macro expansion. I believe that is why even the COMPILE_CPU
preprocessor has a corresponding else part which is empty.
I had found https://stackoverflow.com/questions/28187619/using-preprocessor-conditionals-in-macro-functions-with-double-parentheses but this solution doesn't really solve the problem of code duplication.
If you have some suggestion then I am all ears π
NodeOp is a very simple macro.
#define NodeOp(op) [=]() { op; }
Just expand it then use ifdefs inside.
https://github.com/marian-nmt/marian-dev/blob/c29cc83dc4d234f0e3a00a46a729053132b408b8/src/graph/chainable.h#L11
(And don't feel bad about abstraction, there are already places in the code doing that)
@kpu I have pushed the new commits to narrow the ifdef scopes π
Description
This PR resolves all but last task of the issue https://github.com/browsermt/bergamot-translator/issues/204
List of changes: For generic matrix multiply interface:
Please note that:
--int8shiftAlphaAll
configuration0.0
is being passed aszero point
everywhere.Added dependencies: none
How to test
Follow the README instructions to run the wasm test page and observe the translated sentences in browser console logs. This confirms that the matrix-multiply implementation is imported from different wasm modules.
Checklist