dosaboy / searchkit

Apache License 2.0
2 stars 3 forks source link

search: introduce a new hyperscan-backed searchdef type: `HyperscanSearchDef` #6

Open mustafakemalgilor opened 1 year ago

mustafakemalgilor commented 1 year ago

Searchkit currently uses python's re which is not known for its' "blow your socks off" pattern scanning performance, hence there is an opportunity for optimization by simply swapping the regex engine.

Hyperscan is a highly optimized, performant regex engine that is typically used high throughput network packet inspection systems (e.g. DPI, IDS/IPS systems) for pattern recognition. The work that searchkit does is aligned with hyperscan's properties so it would be beneficial for searchkit to allow downstream users to leverage hyperscan, especially for searching large files.

This patch introduces a hyperscan-backed SearchDef type which can be used as a drop-in replacement for the existing SearchDef type. The patch also adds hyperscan as a dependency and moves searchkit tests to a base class so the tests can be used for testing both SearchDef and HyperscanSearchDef at the same time.