perf record --call-graph dwarf lambcmd.native -i lzdoc.tex -o lzdoc.html showed that a lot of time is spent in camlPervasives__$5e_1104, and that it also triggers GC minor cycles very often. This would be that Pervasives.(^) function used in tokenizer.ml
This patch reduces lambcmd.native time on a 200kB document with lots of RAW tokens: 10s -> 1.5s
Repeatedly concatenating lots of RAW tokens is an O(N^2) operation and generates
a lot of extra work for the garbage collector too.
Use BatText instead which doesn't concatenate immediately but uses ropes.
(Could also use a string list and concatenate at the very end, but BatText is cleaner)
perf record --call-graph dwarf lambcmd.native -i lzdoc.tex -o lzdoc.html
showed that a lot of time is spent incamlPervasives__$5e_1104
, and that it also triggers GC minor cycles very often. This would be that Pervasives.(^) function used in tokenizer.mlThis patch reduces lambcmd.native time on a 200kB document with lots of RAW tokens: 10s -> 1.5s Repeatedly concatenating lots of RAW tokens is an O(N^2) operation and generates a lot of extra work for the garbage collector too. Use BatText instead which doesn't concatenate immediately but uses ropes.
(Could also use a string list and concatenate at the very end, but BatText is cleaner)