browsermt / bergamot-translator

Cross platform C++ library focusing on optimized machine translation on the consumer-grade device.
http://browser.mt
Mozilla Public License 2.0
330 stars 37 forks source link

Compile WASM release also only for generic CPU capabilities (without SIMD). #418

Open polkovnikov opened 2 years ago

polkovnikov commented 2 years ago

Feature request. I think "topic" will be very useful, because if I open your worker in a browser (like this page) it says Uncaught (in promise) RuntimeError: abort(CompileError: WebAssembly.instantiate(): Wasm SIMD unsupported @+1087). Build with -s ASSERTIONS=1 for more info. at abort (bergamot-translator-worker.js:651:10) at bergamot-translator-worker.js:724:4.

It means my CPU is too old and doesn't support desired CPU capabilities, maybe AVX1 or something. My CPU has SSSE3 at most (it has no SSE4...).

I think it would be benefitial (even if it is slow) to compile for older CPUs, only for SSE2, because any x64 is guaranteed to have SSE2. Not relying on having anything bigger than SSE2.

Maybe even to compile without any SIMD at all (without SSE/SSE2 too), only generic CPU instructions.

wazoox commented 1 year ago

I concur, most of my machines are Phenom2 or equivalent, and I have zero need to upgrade to better CPUs so far, they simply are fast enough. Think about the planet, we must keep running old machines as long as possible.

XapaJIaMnu commented 1 year ago

Old machines use more electricity to do the same amount of work. Same is true for using the non simd path on newer CPUs. Having a legacy option is always good but this code path shouldn't be forced for the newer machines, which have much more efficient hardware path.

wazoox commented 1 year ago

We're not asking to force non-optimal code on new machines, we ask for a legacy option for older machines that generally do the job. I can run current OSes and applications on my 8 to 10 year-old machines in a perfectly sufficient way; this particular code is currently the only one that I'm aware of that doesn't work at all, even slowly. If I was proficient in C++ and understood a thing about WASM I'd have a shot at it but unfortunately it's way out of my league :)

This look like it could help: https://www.libvolk.org/

wazoox commented 1 year ago

It looks like it can be done, after this LLVM documentation: https://releases.llvm.org/7.1.0/tools/clang/docs/AttributeReference.html#target-gnu-target

XapaJIaMnu commented 1 year ago

You can use translateLocally which has generic x86 builds, just download any of the compat releases: https://translatelocally.com/ What exactly do you need? Do you need a python compiled for generix x86 archtecture or the wasm module itself?

@graemenail @jelmervdl for translateLocally, we do this by providing -DBUILD_ARCH=x86-64, but afaik wasm is x86? Do we use the x86 as a target?

jelmervdl commented 1 year ago

I think intgemm needs at least SSE2 instructions to even compile, and to enable SSE2 instructions in emscripten you need to compile the wasm binary with WASM SIMD instructions. And that's the problem: on older hardware Chrome and Firefox do not support WASM SIMD instructions at all; and there's not a supported subset of WASM SIMD for SSE2[^1].

For us to be able to compile the wasm version without WASM SIMD instructions, we'd need to add kernel implementations to intgemm that don't rely on SSE2.

[^1]: WASM (and WASM SIMD) don't map directly to x86 instructions. See https://emscripten.org/docs/porting/simd.html for an (inverse) list of how WASM instructions map to native ones.

XapaJIaMnu commented 1 year ago

A slight correction, we need ssse3, as sse2 doesn't have 8 bit instructions... I was wondering if wasm could do some unrolling emulation like macos does, but I guess it doesn't.

The other possibility is to use onnx gemm compilation path, which will be really really slow...