Experiment: outsource hot code paths to rust-based NIFs

sebastian commented 7 years ago

One can write rust based NIFs using rustler. With rust we could get close to C-performance with no runtime GC or other shenanigans, a relatively high-level language, while having type safety and lots of guards making it a language I personally (as a non-C expert) would be comfortable writing in.

It would be interesting to try using rust as an experiment on some hot parts of our system and compare performance and weight it against the added complexity. A candidate for such an experiment would be the sum anonymizer, since it is used by count, our most common aggregator.

Another, albeit more complex, place where it could be very worthwhile to play with would be the emulator.

Thoughts? (I think this was originally suggested by @sasa1977)

obrok commented 7 years ago

I like the idea, and would like to try my hand at this. I'd also feel more comfortable having rust than C, although maybe that's a false comfort, that will evaporate once I learn more of rust.

sebastian commented 7 years ago

Here is a language doc: https://doc.rust-lang.org/book/second-edition/

As a simple simple simple test, we could try something as isolated as the float normalizer. That gives us the infrastructure and some minimal experience with it.

cristianberneanu commented 7 years ago

From my previous tests with NIFs, I doubt this will be worth it. There are two main problems with this approach:

Our performance costs are mostly evenly distributed (the low-hanging fruits were picked).
The transition into a NIF and back is costly.

There isn't one simple function that is a big bottleneck during query execution. This is why, when converting some of the inner steps into NIFs, the performance gain will be offset by the transition costs.

Rewriting some of bigger modules into Rust will be complicated. Not only there are complications with NIFs taking too long (even with dirty-schedulers), but we lose the flexibility and ease of development we currently have with having the entire system written in Elixir.

sebastian commented 7 years ago

Ok, that's sobering. Then we need to find other areas to squeeze more performance out of our system.

cristianberneanu commented 7 years ago

To be more clear: I am not saying we won't see any gains. I estimate that we can possibly get a ~20% improvement in running time, but we will pay a big cost in development complexity for it.

NIFs are good for optimizing CPU intensive tasks, but most of our pipeline is memory intensive. We need to allocate less data to do it less often. We need to transfer less data. We need to generate less garbage.

sebastian commented 7 years ago

Another alternative here is to use reasonml. It's fast, has a kick ass type system by being ocaml, and very familiar to read and write.

cristianberneanu commented 6 years ago

I didn't know exactly where to put this, but here seems like the best location for it.

In the last week I tried, unsuccessfully, to speed up query execution using native code. The following is more a chronicle of the multiple ways I failed, so there isn't anything actionable or constructive here, but maybe someone else learns something from this or avoids going trough the same path in the future.

Since the current implementation of noise layers heavily uses sets of hashes, which consume a significant amount of time being moved around, processed and combined, I thought that by moving storage and processing of that data inside Rust will get us a significant improvement in query execution speed. Unfortunately, I was wrong and it seems the current implementation using standard containers is more than adequate.

First, I implemented a few small functions using rustler and tested their performance. Surprisingly, they were sometimes slower than equivalent Elixir code. I isolated the problem to the rustler abstraction layers, that takes care of decoding the input, handling errors and encoding the response, which adds too much overhead for very simple functions. I then dropped Rustler from the native library and re-implemented all the code using unsafe Rust. The performance significantly improved, the NIF being 5 times faster than the equivalent Elixir code.

Afterwards, I implemented an integer set container as a NIF resource, using the standard Rust collections. I measured the performance of a synthetic benchmark and, again, I noticed decreased performance. Changing the collection from a HashSet to a Vec helped somewhat, so I looked into how the Rust HashSet container works internally. I found out that the default hasher is pretty slow, as it is optimized to avoid DoS scenarios.

I did more tests with different hashing functions, like FNV1a, MurmurHash, xxHash, CityHash, etc. until I realized that we actually store hashes in the container and there is no point in hashing the values again. After modifying the HashSet to use a null hasher method, performance improved again significantly, the NIF container being 3 times faster than the standard Elixir MapSet (which is based on an Erlang map).

I modified the noise layer module to use this new IntSet container and the execution time increased hundreds of times for some types of queries (those that return many buckets). After more measurements and debugging, I found out that the root of the performance drop was in the method that combines two sets.

Internally, Erlang uses persistent Hash Array Mapped Tries (HAMT) for maps, which efficiently compress the stored data into nodes, and have a very performant method of combining two maps by re-using nodes whenever possible.

Unfortunately, I haven't found a good implementation of HAMTs for Rust that also supports merging two containers efficiently (and neither for C). Writing one from scratch seemed like it would require a significant amount of time. So I spent more time trying to optimize the union of the two sets inside Rust by using different collection types, using bloom filters in order to avoid extra-lookups, avoiding merging sets and just storing them in a list, etc. Some methods were always slower, other were sometimes faster, in some scenarios, but none gave consistently better performance.

Since I already wasted too much time on this, I decided to call it quits. What I learned for this:

Standard functions are faster than nifs.
For very small nifs, rustler is not appropriate as it introduces significant overhead.
Erlang maps are very optimized and efficient.
Rust's collections have many pitfalls and don't support some scenarios that well.

I think there is no point in trying to optimize the current implementation more, as we will get increasingly diminishing returns, if any. I will take another look at improving the design instead, I feel that has more potential for getting improvements.

sebastian commented 6 years ago

Thanks for the update @cristianberneanu.

sebastian commented 6 years ago

I suggest we close this issue for now.

Aircloak / aircloak

Experiment: outsource hot code paths to rust-based NIFs #1724