BitFunnel / NativeJIT

A C++ expression -> x64 JIT
http://bitfunnel.org/
MIT License
1.14k stars 83 forks source link

Magic buffer sizes #69

Open joto opened 7 years ago

joto commented 7 years ago

All the sample code contains the same lines with magic numbers:

ExecutionBuffer codeAllocator(8192);
Allocator allocator(8192);
FunctionBuffer code(codeAllocator, 8192);

I have been running into cases with more complex expressions where this is too small. But there is no documentation on how big this should be and how the different buffers fit together. Of course it would be even better if this allocation was dynamic, so I don't have to know...

As a side note: FunctionBuffer takes an unsigned as capacity while ExecutionBuffer and Allocator take a size_t which is inconsistent.

MikeHopcroft commented 7 years ago

Thanks for raising this issue. Magic numbers are indeed problematic.

Historically speaking, eliminating the need for the magic numbers wasn’t a priority in Bing because their use case employed fixed-sized buffers in order to maximize performance. Using fixed-sized buffers allowed them to eliminate all heap allocations on the query processing path. Eliminating heap allocations mainly had an impact on reducing lock contention in a multi-threaded environment. The Bing use case went to extreme lengths to boost performance. As an example, they also used the virtual memory system to allocate guard pages beyond the end of buffers to catch overflows so that they could eliminate bounds checking on each byte write operation.

Most users of NativeJIT don’t need these extreme optimizations and could benefit from buffers that grow automatically as needed. I did a brief investigation and feel it would be feasible to add this support to the codebase without hurting the performance for those who want to use fixed sized buffers.

Before I get into the details, I'll suggest that in many use cases you can just pick magic numbers that are ridiculously large and move on. For example, give each buffer 1Mb. This is likely to work in the vast majority of cases and is still inexpensive on modern hardware.

If you are interested in eliminating magic numbers from the API, the details follow. If you are considering putting together a PR, please check with us first to increase the likelihood that we will be able to take your contribution.

MikeHopcroft commented 7 years ago

With regards to @joto's side note, we're transitioning the code base to use size_t anytime we want a default size unsigned value. Since our compiler targets x64, we feel that there is no performance penalty to using size_t in our API (vs unsigned). Once we've finished transitioning to size_t, the only smaller types will be in places where we require a smaller size (e.g. field of a size-sensitive struct). In general, we're trying to avoid signed types as well unless the use case requires negative numbers. We will transition the code over time, but the priority is low - we're mainly fixing things as we encounter them while implementing higher priority changes. Please feel free to submit PRs on this topic.