ashvardanian / StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
https://ashvardanian.com/posts/stringzilla/
Apache License 2.0
1.92k stars 64 forks source link

Initial Go support #153

Open MarkReedZ opened 2 months ago

MarkReedZ commented 2 months ago

Added a few functions for go. If we merge this I'll finish it up.

$ go run scripts/bench.go
Contains
   38.798µs     strings.Contains
   37.872µs     sz.Contains
Index
   26.29µs  strings.Index
   37.712µs     sz.Index
IndexAny
   588.567µs    strings.IndexAny
   45.911µs     sz.IndexAny
ashvardanian commented 2 months ago

Wow, I am surprised the latency is comparable. How long is the input?

I believe for Go, it would be great to transpile our C code.

MarkReedZ commented 2 months ago

I'll look at transpiling. I don't expect there to be a significant improvement on these benchmarks - its probably likely to slow things down considerably from what I've read.

The strings are 1mb. Below are 10kb strings. Short strings show the same performance except for IndexAny where the go time eventually matches stringzilla.

$ go run bench.go
Contains
   607ns    strings.Contains
   804ns    sz.Contains
Index
   324ns    strings.Index
   492ns    sz.Index
IndexAny
   5.917µs  strings.IndexAny
   733ns    sz.IndexAny
ashvardanian commented 2 months ago

How about super short inputs in the range of 10-100 bytes?

MarkReedZ commented 2 months ago

None of the transpilers work on the stringzilla code. I don't see much information on them.

Build flags are the following by the way. -O3 slows us down.

// #cgo CFLAGS: -g -mavx2
// #include <stdlib.h>
// #include <../../include/stringzilla/stringzilla.h>
import "C"

100b

Contains
   245ns    strings.Contains
   425ns    sz.Contains
Index
   139ns    strings.Index
   137ns    sz.Index
IndexAny
   181ns    strings.IndexAny
   275ns    sz.IndexAny

16b

Contains
   138ns    strings.Contains
   452ns    sz.Contains
Index
   56ns     strings.Index
   198ns    sz.Index
IndexAny
   205ns    strings.IndexAny
   262ns    sz.IndexAny
MarkReedZ commented 2 months ago

Others benchmarking cgo state its 40ns per C call, but I don't see it being that slow. I'll get more of the funcs in as I'm using this in some of my go projects. We can also try manually writing a func or two in go to see how that goes.

MarkReedZ commented 2 months ago

@ashvardanian We do have an issue with repeated c calls. Count is faster until we match too many times. It is ~50ns per call to sz_find. We can?

A) Write Count in C so its one function call to C instead of N calls B) Write sz_find in go

Let me know and I'll go ahead. The below numbers are for a 9 byte substring to match - strings.Count is faster if the substring to search for is 4 bytes or less.

Count 1  
   55.06µs  strings.Count
   40.692µs     sz.Count
Count 100,000 
   1.344558ms   strings.Count
   6.815231ms   sz.Count
ashvardanian commented 2 months ago

How about we implement the count in the cGo header?