Open parisholley opened 5 years ago
Thanks for this @parisholley. Internally, we had recently been discussing the benefits of a Rust version so it's very helpful to have this work as a start point. Also good to know there is demand out there for it!
We'll aim to get this reviewed and merged into the master branch in the next few weeks.
FYI, cut a crate so i can start getting deploys going in my test environment, can add your team as owners if you decide to take the lib:
@Steve51D no worries, i'm behind on my roll out anyway so this isn't in production yet
@Steve51D heads up, i'm working on a new version based on the v4 tag. it won't be implementing the pipeline stack, i'll just be hooking directing into the EngineHash
for simple on-premise device lookups, but it will at least be a starting point for you guys.
i still have way more to go, but have a working POC of tying the rust code with the c++ library:
https://github.com/mantisadnetwork/device-detection-rust
so far, only have it building on mac.. from what i can tell on my machine, i can get 0.0015ms per lookup performance
anywhere I can see how you ran the benchmarks so I can do a fair comparison?
@Steve51D thanks for that link, i've updated the readme with instructions on compiling and benchmarking locally. it has been rewritten to use the c library instead of the c++ version i was doing before.
running the benchmarks locally, i see 4x .NET in max performance mode (as listed on the public page, i haven't run .NET perf locally)
the implementation has been limited in scope to my use case, so it isn't the full c API translated over. there is likely to be some refactoring/cleaning to be done but i believe what is there to be structurally sound(c bindings). may help to get on a call with one of your engineers to verify how I integrated with the C library is valid and that the benchmark implementation is comparable
Issuing a pull just to share WIP and discuss intentional design decisions:
I used bindgen to put together boilerplate extern code, this is going to get cleaned up at some point
I decided to build the library with threading turned off as rust will not allow the user to mutate the underlying provider in a thread-unsafe way (at the moment, I do not have reloading, etc implemented).
From what I understand of the library, once the dataset has been created from the file, it should be safe to request the "property indexes" ahead of time to save on lookup costs, let me know if you believe this is a risk due to runtime mutation.
I like the idea of compile-time checked property interactions, eg: PropertyName::BrowserVersion and PropertyValue::BrowserVersion("1.2.3"), rust makes these nice and clean but it will require changes to the library when new properties are added to the data set.
Undecided on whether the "value" types will be part of the shipped library, or instead as a feature that can be turned on. As an example, some implementors may only care about values as it pertains to OpenRTB (like device type) and do not want/need the specificity provided by the library. Another example would be the different browser sub-types (eg: "Facebook for Android", "Facebook for BlackBerry"), to me they are just "Facebook", but perhaps someone wants the detail. The other downside is that for some property types, the unique values will continue to go up over time, which is another thing to manage in the code base.
Benchmark Results
On my macbook pro:
Trie
cargo bench --features trie trie_bench
test trie::tests::trie_bench ... bench: 253 ns/iter (+/- 60)
Pattern
cargo bench --features pattern pattern_bench
test pattern::tests::pattern_bench ... bench: 2,789 ns/iter (+/- 411)