BurntSushi / aho-corasick

A fast implementation of Aho-Corasick in Rust.
The Unlicense
1.03k stars 93 forks source link

Go wrapper for aho-corasick #125

Closed tmikus closed 1 year ago

tmikus commented 1 year ago

Hey Andrew,

I hope you find this useful - I just made a Go wrapper for your amazing library: https://github.com/tmikus/ahocorasick_rs.

It's incredible how much faster your library is compared to the one available in Go, so I thought I might spread your awesomeness to that language. You're welcome to add a link to it in the "FFI bindings" section if you feel like the library is up to the scratch.

In my very simplified benchmark it turned out that the wrapper around your library is much faster than both https://github.com/cloudflare/ahocorasick and github.com/BobuSumisu/aho-corasick

BurntSushi commented 1 year ago

Hiya! That's pretty awesome. Can you share your benchmark results? :-)

Out of curiosity, do you have an ultimate use case for the library? Like, what made you write the binding in the first place?

I do have some feedback.

Firstly, when I run go test, I get this:

$ go test
go: errors parsing go.mod:
/home/andrew/clones/ahocorasick_rs/go.mod:3: invalid go version '1.21.0': must match format 1.23

Once I switched it to 1.21, things worked. I was somewhat surprised, because I didn't think the Go toolchain knew how to build Rust libraries. Then I realized that you bundled pre-compiled libraries and hard-code paths to them:

/*
#cgo darwin,arm64 LDFLAGS: -L./lib/darwin -lahocorasick_rs_arm64
#cgo darwin,amd64 LDFLAGS: -L./lib/darwin -lahocorasick_rs_amd64
#cgo linux,arm64 LDFLAGS: -L./lib/linux -lahocorasick_rs_arm64
#cgo linux,amd64 LDFLAGS: -L./lib/linux -lahocorasick_rs_amd64
#cgo windows,amd64 LDFLAGS: -L./lib/windows -lahocorasick_rs_amd64
#include "./lib/ahocorasick_rs.h"
#include <stdlib.h>
*/
import "C"

I'm pretty sure this is no idiomatic. See for example how I structured Go bindings to Rust's regex crate (and the README contains build instructions): https://github.com/BurntSushi/rure-go

Other remarks:

For reference, my benchmark results are:

$ GOMAXPROCS=1 go test -bench .
.
1 total assertion

..
3 total assertions

goos: linux
goarch: amd64
pkg: github.com/tmikus/ahocorasick_rs
cpu: 12th Gen Intel(R) Core(TM) i9-12900K
BenchmarkAhoCorasickGoCloudflare             928           1270858 ns/op
BenchmarkAhoCorasickGoBobuSumisu             618           1956127 ns/op
BenchmarkAhoCorasickRs                      1648            727579 ns/op
PASS
ok      github.com/tmikus/ahocorasick_rs        3.997s

My suspicion is that there should be cases where the other libraries are faster. IIRC, Go's FFI is quite expensive, so a benchmark on a very short haystack might fair better with a native Go library. They might be able to start and finish a search before your Rust wrapper can even start its search.

With all that said, it looks like you put a lot of work into this library. Very nicely done. Other than the Close thing, the API looks very comprehensive. :-)

BurntSushi commented 1 year ago

Perhaps it would make sense to hook it up to this benchmark? https://github.com/BobuSumisu/aho-corasick-benchmark

It has the benefit of being a benchmark not written by me or you. :-)

tmikus commented 1 year ago

Thank you for the amazing feedback! I couldn't have hoped for a better review of this Work-in-progress library.

Out of curiosity, do you have an ultimate use case for the library? Like, what made you write the binding in the first place?

In my job we might soon be building a Go app that would benefit from having an access to a very fast AhoCorasick library, and after doing some comparison I found that yours was the best one I found. Building a new project with Rust woudn't go down too well with the leadership so I decided to go down the road of the least-resistance, and make the wrapper for Rust version of your library myself 😊

Firstly, when I run go test, I get this:

$ go test
go: errors parsing go.mod:
/home/andrew/clones/ahocorasick_rs/go.mod:3: invalid go version '1.21.0': must match format 1.23

I suspect this is because of the minimum version of go in the go.mod file being set to 1.21, which actually can be lowered. I'll check what is the actual lowest version of Go this package needs.

I was somewhat surprised, because I didn't think the Go toolchain knew how to build Rust libraries. Then I realized that you bundled pre-compiled libraries and hard-code paths to them:

This is mostly because I don't want people to have to install Rust and manually compile this project just to run the library. I'll check that library you sent to see if there's a nicer way to accomplish this.

  • Why do you ask the user to "close" the AhoCorasick searcher after they're done with it? AFAIK, the idiomatic solution here in Go-land is to use runtime.SetFinalizer for deallocating resources across an FFI boundary.

Oh nice, I also don't like that Close() function... Thank you for pointing me in the direction of SetFinalizer!

Hiya! That's pretty awesome. Can you share your benchmark results? :-)

Out of curiosity, do you have an ultimate use case for the library? Like, what made you write the binding in the first place?

I do have some feedback.

Firstly, when I run go test, I get this:

$ go test
go: errors parsing go.mod:
/home/andrew/clones/ahocorasick_rs/go.mod:3: invalid go version '1.21.0': must match format 1.23

Once I switched it to 1.21, things worked. I was somewhat surprised, because I didn't think the Go toolchain knew how to build Rust libraries. Then I realized that you bundled pre-compiled libraries and hard-code paths to them:

/*
#cgo darwin,arm64 LDFLAGS: -L./lib/darwin -lahocorasick_rs_arm64
#cgo darwin,amd64 LDFLAGS: -L./lib/darwin -lahocorasick_rs_amd64
#cgo linux,arm64 LDFLAGS: -L./lib/linux -lahocorasick_rs_arm64
#cgo linux,amd64 LDFLAGS: -L./lib/linux -lahocorasick_rs_amd64
#cgo windows,amd64 LDFLAGS: -L./lib/windows -lahocorasick_rs_amd64
#include "./lib/ahocorasick_rs.h"
#include <stdlib.h>
*/
import "C"

I'm pretty sure this is no idiomatic. See for example how I structured Go bindings to Rust's regex crate (and the README contains build instructions): https://github.com/BurntSushi/rure-go

Other remarks:

* Why is `ahocorasickkind` a sub-package? Why not just include the enum in the top-level package?

* Why do you ask the user to "close" the `AhoCorasick` searcher after they're done with it? AFAIK, the idiomatic solution here in Go-land is to use [`runtime.SetFinalizer`](https://pkg.go.dev/runtime#SetFinalizer) for deallocating resources across an FFI boundary.

* I'd suggest that your README include a section about why someone might want to use your library instead of a native Go library. "It's faster" is perhaps the easy answer, but it would be good to accompany that with data. There are likely other answers too. This aho-corasick library is, I believe, one of the most featureful in existence. The leftmost-first and leftmost-longest are, in particular, a somewhat unique (but not absolutely unique) offering.

For reference, my benchmark results are:

$ GOMAXPROCS=1 go test -bench .
.
1 total assertion

..
3 total assertions

goos: linux
goarch: amd64
pkg: github.com/tmikus/ahocorasick_rs
cpu: 12th Gen Intel(R) Core(TM) i9-12900K
BenchmarkAhoCorasickGoCloudflare             928           1270858 ns/op
BenchmarkAhoCorasickGoBobuSumisu             618           1956127 ns/op
BenchmarkAhoCorasickRs                      1648            727579 ns/op
PASS
ok      github.com/tmikus/ahocorasick_rs        3.997s

My suspicion is that there should be cases where the other libraries are faster. IIRC, Go's FFI is quite expensive, so a benchmark on a very short haystack might fair better with a native Go library. They might be able to start and finish a search before your Rust wrapper can even start its search.

With all that said, it looks like you put a lot of work into this library. Very nicely done. Other than the Close thing, the API looks very comprehensive. :-)

Thanks, I'll work on adding benchmark section. I'll try to use someone else's benchmarks to avoid the pitfal of over-optimising code for my own benchmarks.

Perhaps it would make sense to hook it up to this benchmark? https://github.com/BobuSumisu/aho-corasick-benchmark

Great plan! I'll try to use that :)

Can you share your benchmark results? :-)

Here are my results on my desktop:

goos: linux
goarch: amd64
pkg: github.com/tmikus/ahocorasick_rs
cpu: AMD Ryzen 7 7800X3D 8-Core Processor
BenchmarkAhoCorasickGoCloudflare-16          914           1303852 ns/op
BenchmarkAhoCorasickGoBobuSumisu-16          585           2061054 ns/op
BenchmarkAhoCorasickRs-16                   1446            797147 ns/op
PASS
ok      github.com/tmikus/ahocorasick_rs        3.985s
BurntSushi commented 1 year ago

Nice sounds good!

This is mostly because I don't want people to have to install Rust and manually compile this project just to run the library. I'll check that library you sent to see if there's a nicer way to accomplish this.

Yeah I figured as much. My suspicion is that folks won't be expecting a binary blob they didn't build to get linked into their Go build automatically. So it might wind up making folks want to use your library less than if you required a Rust toolchain. I'm not 100% positive, but that's my sense of the zeitgeist. :-)

tmikus commented 1 year ago

FYI, turns out even despite the FFI penalty the wrapper of your library is still faster than than any other listed in https://github.com/BobuSumisu/aho-corasick-benchmark 😁

Here are the results from my macbook:

          name    patterns        build    search    matches       alloc
    anknown           1000       0.75ms    2.34ms        407     0.06GiB
    bobusumisu        1000       0.72ms    0.55ms        407     0.07GiB
    cloudflare        1000       6.43ms    0.22ms          9     0.12GiB
    iohub             1000       0.37ms    0.41ms        407     0.12GiB
    tmikus            1000       1.50ms    0.23ms        388     0.12GiB

    anknown           2000       1.49ms    2.26ms        413     0.12GiB
    bobusumisu        2000       1.82ms    0.54ms        413     0.13GiB
    cloudflare        2000      18.82ms    0.27ms         13     0.23GiB
    iohub             2000       0.72ms    0.40ms        413     0.23GiB
    tmikus            2000       1.97ms    0.22ms        388     0.23GiB

    anknown           4000       3.11ms    2.31ms       1429     0.24GiB
    bobusumisu        4000       3.44ms    0.57ms       1429     0.26GiB
    cloudflare        4000      29.64ms    0.25ms         45     0.45GiB
    iohub             4000       1.39ms    0.48ms       1429     0.45GiB
    tmikus            4000       3.43ms    0.24ms        972     0.45GiB

    anknown           8000       6.73ms    2.44ms       3485     0.46GiB
    bobusumisu        8000      10.16ms    0.67ms       3485     0.50GiB
    cloudflare        8000      51.83ms    0.30ms         86     0.89GiB
    iohub             8000       2.77ms    0.63ms       3485     0.89GiB
    tmikus            8000       6.56ms    0.29ms       2303     0.89GiB

    anknown          16000      16.11ms    2.63ms       7977     0.92GiB
    bobusumisu       16000      19.34ms    0.86ms       7977     0.99GiB
    cloudflare       16000      98.52ms    0.35ms        173     1.77GiB
    iohub            16000       6.27ms    1.06ms       7977     1.78GiB
    tmikus           16000      12.71ms    0.36ms       5203     1.78GiB

    anknown          32000      28.66ms    2.68ms      10025     1.83GiB
    bobusumisu       32000      31.09ms    0.93ms      10025     1.97GiB
    cloudflare       32000     169.21ms    0.43ms        262     3.50GiB
    iohub            32000      12.86ms    1.27ms      10025     3.52GiB
    tmikus           32000      25.36ms    0.42ms       6280     3.52GiB

    anknown          64000      60.89ms    2.82ms      12505     3.62GiB
    bobusumisu       64000      48.42ms    1.12ms      12505     3.88GiB
    cloudflare       64000     333.61ms    0.53ms        526     6.85GiB
    iohub            64000      25.67ms    1.42ms      12505     6.89GiB
    tmikus           64000      49.33ms    0.43ms       7165     6.90GiB

    anknown         128000     147.33ms    4.54ms      39334     7.09GiB
    bobusumisu      128000     121.79ms    2.46ms      39334     7.62GiB
    cloudflare      128000     761.86ms    1.31ms       1141    13.46GiB
    iohub           128000      56.81ms    3.51ms      39300    13.54GiB
    tmikus          128000     104.42ms    1.22ms      21913    13.55GiB

    anknown         256000     341.93ms    6.23ms      59391    13.95GiB
    bobusumisu      256000     209.34ms    3.91ms      59391    14.99GiB
    cloudflare      256000    1645.95ms    2.03ms       2243    26.84GiB
    iohub           256000     122.38ms    5.90ms      58923    26.99GiB
    tmikus          256000     235.84ms    1.79ms      29451    27.01GiB

    anknown         512000     851.71ms    8.61ms      94000    27.93GiB
    bobusumisu      512000     419.63ms    6.07ms      94000    30.01GiB
    cloudflare      512000    4240.75ms    3.07ms       4490    53.69GiB
    iohub           512000     264.89ms    9.36ms      91986    53.98GiB
    tmikus          512000     526.98ms    2.02ms      38269    54.03GiB
BurntSushi commented 1 year ago

Yeah I would expect that I think. That benchmark isn't latency sensitive because the haystack is big enough. You'll likely need to shrink the unit of work (i.e., use a small haystack) in order to observe FFI overhead impacting benchmarks.

tmikus commented 1 year ago

You'll likely need to shrink the unit of work (i.e., use a small haystack) in order to observe FFI overhead impacting benchmarks.

Thanks, I'll look into that this week after work 😊

On a similar note, have you seen this blog post about how to make Go FFI calls much faster? https://words.filippo.io/rustgo/

BurntSushi commented 1 year ago

Yes. I read it when it was published. Whether folks will want to go through that and rely on it for faster multi-substring search remains to be seen. :-)

To be clear, I'm not talking about FFI overhead from the lens of "this is something you should fix." I'm talking about it because it's important to present as full a picture as you can when discussing performance.

tmikus commented 1 year ago

I just made a slightly modified version of BobuSumisu's benchmark, which tests all libraries against different length input strings: https://github.com/tmikus/aho-corasick-benchmark

The TLDR; is that for inputs below 3k characters it doesn't really matter which Go library one picks. The results are so small that even run-to-run variation can affect the score. For inputs above 3k this wrapper is clearly pulling into the lead.

I feel like chasing the gains provided by the approach from https://words.filippo.io/rustgo/ is not really worth it so I might just leave this project in the state that it is currently, as it's clearly good enough.

Thank you for spending your valuable time on helping me with it!

BurntSushi commented 1 year ago

Perfect! If you want to submit a PR adding a link to your project to the README, that would be great. :-) I would say put it in a new section in the bottom.