Closed petdance closed 1 year ago
Digging more in to HTML::Escape, I see that its encoding rules aren't identical to HTML::Entities' defaults. Not saying that's a problem, just a difference.
# 0x22 " 0x26 & 0x27 ' 0x3c < 0x3e > 0x60 ` 0x7b { 0x7d }
my $str = qq{Let's go to "<Dave> `&` {Buster's}"};
say 'raw: : ', $str;
say 'HTML::Entities: ', HTML::Entities::encode_entities($str);
say 'HTML::Escape : ', HTML::Escape::escape_html($str);
$ perl escape-vs-entities.pl
raw: : Let's go to "<Dave> `&` {Buster's}"
HTML::Entities: Let's go to "<Dave> `&` {Buster's}"
HTML::Escape : Let's go to "<Dave> `&` {Buster's}"
I like faster. But I'm left with making a decision on if the modules are identical. If we can make it optional, I'd be up for that PR.
Excellent, I'll get on it.
To be clear, you're saying that if someone has HTML::Escape installed, TT would still need to be explicitly told to use HTML::Escape.
Or if HTML::Escape exists, should TT use it, and if not, fall back to HTML::Entities? The current implementation is for html_entity filter to look for Apache::Util, and then HTML::Entities, and then it fails.
If we need to specify to use HTML::Escape explicitly, how do you see that being done? Somewhere in the Template::Context, controlled with a parameter passed to the constructor? An argument in the FILTERS => { } parameter?
Thanks Andy, I'm all in favour of this.
I'm inclined to suggest that it should use the first of HTML::Escape
, Apache::Util
or HTML::Entities
that it can find. So we bump HTML::Escape
up to the front of the priority queue.
Perhaps there could be a note in the documentation saying that if you need to choose a specific implementation then you can pre-set the corresponding value in $Template::Filters::AVAILABLE
, e.g.
$Template::Filters::AVAILABLE->{ HTML_ENTITY } = \&HTML::Entities::encode_entities
I'm now thinking bigger.
I think we need an HTML::Entities::Fast (or ::XS) that is a front-end to HTML::Entities, but replaces HTML::Entities::encode_entities with a copy of HTML::Escape::encode_html that matches HTML::Entities::encode_entities in only replacing the five values it replaces, AND without the optional list of replacements that the stock encode_entities takes.
Once that exists, then that lets TT use either HTML::Entities::Fast or HTML::Entities, and the user will see it transparently, because the html_entity filter does not use the optional list of replacements feature.
Also, having HTML::Entities::Fast lets the user use the new encode_entities in all the non-TT places that they need faster encoding.
My plan is to make HTML::Entities::Fast with of the C code in HTML::Escape, with modifications of what entities it encodes. Once that exists, then move over to TT getting to use it.
HTML::Escape and HTML::Entities are not compatible. HTML::Escape only does a small subset of the conversion that HTML::Entities does.
HTML::Escape is a fast HTML entity encoder that is much faster than HTML::Entities. I'd like to supply a patch that would let TT use it just like it would like to use Apache::Util.
I threw together a benchmark where a simple TTML file of
is run with html_entity in the stock TT, and then with again with a filter built on HTML::Escape. The template is processed 20,000 times, and is cached to avoid the overhead of recompiling. The difference is clear.
My plan is to just add HTML::Escape as an option in Template::Filters like we have use_apache_util, but it would be use_html_escape. HTML::Escape would be the preferred encoder, if it exists.
Is this something that you'd like to adopt? I'll get on it as soon as you say yes.