abw / Template2

Perl Template Toolkit v2
http://template-toolkit.org/
146 stars 94 forks source link

Speed up html_entity by optionally using HTML::Escape #304

Closed petdance closed 1 year ago

petdance commented 1 year ago

HTML::Escape is a fast HTML entity encoder that is much faster than HTML::Entities. I'd like to supply a patch that would let TT use it just like it would like to use Apache::Util.

I threw together a benchmark where a simple TTML file of

[% name | html_entity %]. (repeated 20 times in the file)

is run with html_entity in the stock TT, and then with again with a filter built on HTML::Escape. The template is processed 20,000 times, and is cached to avoid the overhead of recompiling. The difference is clear.

             Rate   filtered fastfilter
filtered   3096/s         --       -49%
fastfilter 6024/s        95%         --

My plan is to just add HTML::Escape as an option in Template::Filters like we have use_apache_util, but it would be use_html_escape. HTML::Escape would be the preferred encoder, if it exists.

Is this something that you'd like to adopt? I'll get on it as soon as you say yes.

petdance commented 1 year ago

Digging more in to HTML::Escape, I see that its encoding rules aren't identical to HTML::Entities' defaults. Not saying that's a problem, just a difference.

#  0x22 "   0x26 &   0x27 '   0x3c <   0x3e >   0x60 `   0x7b {   0x7d }

my $str = qq{Let's go to "<Dave> `&` {Buster's}"};
say 'raw:          : ', $str;
say 'HTML::Entities: ', HTML::Entities::encode_entities($str);
say 'HTML::Escape  : ', HTML::Escape::escape_html($str);

$ perl escape-vs-entities.pl
raw:          : Let's go to "<Dave> `&` {Buster's}"
HTML::Entities: Let&#39;s go to &quot;&lt;Dave&gt; `&amp;` {Buster&#39;s}&quot;
HTML::Escape  : Let&#39;s go to &quot;&lt;Dave&gt; &#96;&amp;&#96; &#123;Buster&#39;s&#125;&quot;
toddr commented 1 year ago

I like faster. But I'm left with making a decision on if the modules are identical. If we can make it optional, I'd be up for that PR.

petdance commented 1 year ago

Excellent, I'll get on it.

To be clear, you're saying that if someone has HTML::Escape installed, TT would still need to be explicitly told to use HTML::Escape.

Or if HTML::Escape exists, should TT use it, and if not, fall back to HTML::Entities? The current implementation is for html_entity filter to look for Apache::Util, and then HTML::Entities, and then it fails.

If we need to specify to use HTML::Escape explicitly, how do you see that being done? Somewhere in the Template::Context, controlled with a parameter passed to the constructor? An argument in the FILTERS => { } parameter?

abw commented 1 year ago

Thanks Andy, I'm all in favour of this.

I'm inclined to suggest that it should use the first of HTML::Escape, Apache::Util or HTML::Entities that it can find. So we bump HTML::Escape up to the front of the priority queue.

Perhaps there could be a note in the documentation saying that if you need to choose a specific implementation then you can pre-set the corresponding value in $Template::Filters::AVAILABLE, e.g.

$Template::Filters::AVAILABLE->{ HTML_ENTITY } = \&HTML::Entities::encode_entities

petdance commented 1 year ago

I'm now thinking bigger.

I think we need an HTML::Entities::Fast (or ::XS) that is a front-end to HTML::Entities, but replaces HTML::Entities::encode_entities with a copy of HTML::Escape::encode_html that matches HTML::Entities::encode_entities in only replacing the five values it replaces, AND without the optional list of replacements that the stock encode_entities takes.

Once that exists, then that lets TT use either HTML::Entities::Fast or HTML::Entities, and the user will see it transparently, because the html_entity filter does not use the optional list of replacements feature.

Also, having HTML::Entities::Fast lets the user use the new encode_entities in all the non-TT places that they need faster encoding.

My plan is to make HTML::Entities::Fast with of the C code in HTML::Escape, with modifications of what entities it encodes. Once that exists, then move over to TT getting to use it.

petdance commented 1 year ago

HTML::Escape and HTML::Entities are not compatible. HTML::Escape only does a small subset of the conversion that HTML::Entities does.