Closed josharian closed 1 month ago
Thanks, this is interesting. I'll find time to review these
I do wonder, however, about your use case. Can you elaborate where higher encoding performance is important to you, or is this just for fun? LLMs generally are pretty slow and in recent comparisons I've seen providers brag that their models reach something like 200 tokens/second (and that's probably on the beefiest HW). In comparison, unless I've messed up my benchmark, go-sentencepiece
encodes at well over half a million tokens/sec, so it's hard for me to imagine a situation where this tokenization is a bottleneck.
I'll find time to review these
Great, thanks. I made some decisions along the way with not 100% confidence. Pushback/questions are--always--welcome.
Can you elaborate where higher encoding performance is important to you
As part of exploration and dataset preparation, I end up doing mass tokenization runs, so I really feel the performance--it directly impacts my iteration speed as I'm hacking around.
And I'm not working with frontier models (gemma, not gemini). And some of my work involves squeezing extra performance out, plus our use case is quite latency-sensitive.
I also generally have a habit of occasionally taking a day to crush any boxes I can in pprof. Cumulatively, that adds up.
Anyway, I put in this time to get to rough parity with the cgo implementation, at least for my use cases, so that I can switch to pure Go without guilt. Plus it was fun. :)
Will push up a revised copy once the question above is answered, and presuming you're happy with my other comments. Sorry for the delay, been a busy couple of days.
This is a grab-bag of optimizations. I recommend reviewing commit-by-commit and rebasing instead of squashing.
Their cumulative effect, on my laptop, for an out-of-tree benchmark (sorry) is:
I also will understand if these are viewed as too intrusive/complicated for this codebase. :) I am happy to maintain a fork as needed.