Closed nblumhardt closed 4 years ago
So, this one is a little bit of a "submarine" feature, in that we've been working on it for some time without saying too much about it - but finally we have a publicly-consumable 2020.1 build that includes most of it, and I can fill in some details :-)
Our aim with this feature is to gain performance and reduce hosting costs by executing from-disk queries more efficiently.
As you may know, Seq was originally implemented 100% in C# on the .NET Framework, on Windows. Seq 5 broke this mold and became the first version of Seq to run cross-platform, using .NET Core. Taking Seq cross-platform also meant breaking our dependency on ESENT (Windows-only) for event storage, and we introduced the native "Flare" storage engine built in Rust.
The Rust side of the codebase has proven incredibly solid and maintainable in the first two years of its life - probably not surprising, as the language is designed with a strong bias towards the kinds of low-allocation, high-performance, concurrent systems that we're building. It's not replacing C# in our codebase, but in the low-level event storage and query layers we're keen to use it much more.
Seq 5's storage engine handles raw bytes only: it doesn't know anything about events, let alone JSON or query evaluation.
In Seq 2020, we've moved the next layer of search and query evaluation down into the Rust part of the codebase. Event searches, and the projection phase of SQL-style query evaluation, can now execute in native code.
The gain we get from this is less CPU usage at interop boundaries, less pressure on the .NET GC, and more opportunities to optimize by integrating the storage engine and expression evaluator. For example, the query:
select mean(Elapsed) from stream where RequestPath = '/test'
will execute a projection in native code resembling:
select Elapsed from stream where RequestPath = '/test'
The aggregate expression mean(Elapsed)
is still executed in C#, however, only the value of the Elapsed
property is passed back to .NET, and only where RequestPath = '/test'
.
Additionally, we skip deserializing any fields from the event except Elapsed
and RequestPath
- much, much less work when events are large.
Since less GC pressure is applied, we're a bit braver about query parallelism, and will use (by default) 4 threads to execute the query to try to squeeze more utilization out of the server.
Even plain event searches like:
@Message like 'starting up on %'
execute native queries behind the scenes, in this case resembling:
select @Document from stream where @Message like 'starting up on %'
It's important to note that native queries don't speed up searches from the RAM cache: currently, 2020.1 is a little faster than Seq 5.1 when serving results from RAM, but it's outside of the RAM cache region, when the query engine has to retrieve data from disk, that native queries should have a noticeable impact.
We're planning some blog posts talking about all of these points. In the meantime, if you have a non-production environment that you'd like to try this in, the new preview builds of 2020.1 have native queries enabled. Feedback and results most welcome :-)
Edit: just adding - you can find the Windows installer at https://datalust.co/download, and the Docker image is datalust/seq:preview
. 👋
Seq 5.1 runs queries in C# atop a native code storage engine written in Rust.
In Seq 2020 we'll move significant amounts of query processing down into the Rust storage engine to avoid interop overheads and open up more optimization techniques.