Closed troelskn closed 8 months ago
I've tested this on the test suite of a fairly complex proprietary application and there seems to be no issues. I'm fairly certain it would be safe to merge.
Hi @troelskn sorry for the delay here, I've just released version 3.1.0 that has support for injecting your own tokenizer, so you can use your more efficient version for your purposes, I hope this helps!
I went ahead and rewrote the tokenizer for performance. This improves performance considerably, going from a baseline of 4.717s to 1.921s on a large file. Memory footprint is the same.
I also tried using preg for tokenization, but this didn't really improve the results. I suspect this is because most source files consists of many small single-char tokens.
Since this is a complete rewrite of core code, it should probably be tested in depth before merging.