Avoid repeated copying in `clean_tokens`.

nnethercote commented 2 years ago

clean_tokens takes a vector of tokens and removes some. The decision of which tokens to remove is somewhat complicated, so it can't use retain or filter, but instead does everyting itself. This includes calling remove on every individual removed token. Unfortunately, this results in quadratic-ish behaviour when the number of tokens removed is large, due to all the shuffling down of tokens after each removed token. The closer the removed tokens are to the start of the vector, the worse it is.

This commit switches to a two pass approach. The first pass records which tokens should be removed, using an auxiliary vector of bools. The second pass does the removal, using Vec::retain. When running rustdoc (which uses minifier) on helloworld, this reduces memcpy traffic from this:

297,675,803 bytes in 869,856 blocks, avg size 342.21 bytes

to this:

179,253,407 bytes in 868,510 blocks, avg size 206.39 bytes

GuillaumeGomez commented 2 years ago

Very interesting approach! Thanks!

nnethercote commented 2 years ago

Thank you for the fast response. Would you be able to make a new release so we can take advantage of the speedup in rustdoc?

GuillaumeGomez commented 2 years ago

Already on it: #100. ;)

GuillaumeGomez / minifier-rs

Avoid repeated copying in `clean_tokens`. #98