emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.89k stars 3.32k forks source link

Is WebAssembly slower than NaCl ? #12989

Open waghamolm opened 3 years ago

waghamolm commented 3 years ago

We have a chrome extension written in C++ and Javascript. The extension uses Chrome's NaCl sandbox for running compiled C++ code in the browser efficiently and securely. Recently, we started migrating the extension to make use of WebAssembly instead of NaCl. For that, we used Emscripten compiler v2.0.7 to compile C++ code. Without modifying the C++ code and using the same optimisation level during compilation, we found that WebAssembly build is slower than the NaCl build.

While analysing the performance degradation on the WebAssembly build, I found goto statement is one of the root cause. I prepared a simple Chrome extensions using NaCl and WebAssembly which uses goto statement to simulate the loop iteration. Result shows that NaCl build is faster than the WebAssembly build.

Hereby, I want to know whether migrating to WebAssembly results in performance degradation w.r.t. NaCl ?

neelance commented 3 years ago

This seems related to https://github.com/WebAssembly/design/issues/796

waghamolm commented 3 years ago

Thank you @neelance for the reply.

From the link you shared, I could infer the following points. Please confirm whether my understanding is correct or not.

  1. WebAssembly standard is in discussion whether goto should be supported or not.
  2. However, the Emscripten compiler seems to be supporting it.

The attached Chrome extensions use a goto statement to simulate a for loop iterating from 1 to 1000. Following is the result when extensions are run on the test machine having Ubuntu 18.04.5 LTS and Chrome v 86.0.4240.75

Result of NaCl based Chrome extension :

Iteration value in micro sec: 1.132
Iteration value in micro sec: 1.343
Iteration value in micro sec: 0.837
Iteration value in micro sec: 0.937
Iteration value in micro sec: 0.917

Result of WebAssembly based Chrome extension :

Iteration value in micro sec: 20.000000
Iteration value in micro sec: 5.000000
Iteration value in micro sec: 5.000000
Iteration value in micro sec: 5.000000
Iteration value in micro sec: 15.000000

The result shows that WebAssembly build compiled using Emscripten performs slower than the NaCl in such cases. Are there other areas (like goto statement) where we can expect slower performance of web assembly w.r.t. NaCl ?

kripken commented 3 years ago

@waghamolm

In general, wasm and NaCl use very different routes for compiling and running code. For example, NaCl was compiled with LLVM or gcc, while wasm VMs are generally designed more for speed of compilation than LLVM is. So it is expected to see performance differences, which could be due to register allocation (where LLVM tends to be better than wasm VMs) and other factors. Control flow as @neelance mentioned may be a factor as well.

If you can create a benchmark of your code that could be useful to investigate.

waghamolm commented 3 years ago

Thank you @kripken for the answer. As mentioned in the document, we observed similar results about slowness of WebAssembly. For us, on an average, the WebAssembly based Chrome extension is 1.3x slower than the NaCl based Chrome extension. And in worst case, its 2x slower.

About benchmarking the attached examples, I am not aware about the standard (or correct) way of doing that. The shared example has the following function written in C++.

#include <iomanip> // for setprecision
#include <iostream>
#include <chrono>
using namespace std;

  void gotoStatement(int32_t limit) {
    auto t_start = std::chrono::high_resolution_clock::now();
    int32_t number = 1;
    repeat:
      number++;
      if (number <= limit)
        goto repeat;
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << std::setprecision(6) << std::fixed
        << std::chrono::duration<double, std::micro>(t_end-t_start).count()
        << std::endl;
  }

Time logs, shared earlier, are from WebAssembly and NaCl builds and when the limit is set to 1000. However, I later realised that, even if the limit is set to 1, (i.e. goto is executed only once), the results remain the same.

kripken commented 3 years ago

It can be hard to benchmark stuff like this, yeah. If the results don't change when you change the limit, maybe the compiler just optimizes out the entire loop.

Even if you do measure a loop with interesting work, though, then small microbenchmarks like that can be very noisy, as they depend on tiny details of the wasm VM's decisions on register allocation and so forth.

Benchmarking on real-world code is usually better. A 30% slowdown compared to NaCl is maybe not that surprising, given the issues mentioned earlier - NaCl uses LLVM or gcc while wasm VMs are simpler. In time that will improve, but there is no quick way to match LLVM or gcc.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.