darioteixeira / lambdoc

Lambdoc is a library providing support for semantically complex documents in Ocsigen web applications
GNU General Public License v2.0
17 stars 1 forks source link

Use ropes instead of repeated string concatenation #12

Closed edwintorok closed 9 years ago

edwintorok commented 9 years ago

perf record --call-graph dwarf lambcmd.native -i lzdoc.tex -o lzdoc.html showed that a lot of time is spent in camlPervasives__$5e_1104, and that it also triggers GC minor cycles very often. This would be that Pervasives.(^) function used in tokenizer.ml

This patch reduces lambcmd.native time on a 200kB document with lots of RAW tokens: 10s -> 1.5s Repeatedly concatenating lots of RAW tokens is an O(N^2) operation and generates a lot of extra work for the garbage collector too. Use BatText instead which doesn't concatenate immediately but uses ropes.

(Could also use a string list and concatenate at the very end, but BatText is cleaner)

darioteixeira commented 9 years ago

Thanks @edwintorok. I'm merging this now. Note that there's bound to be many other chances for such optimisations...