abw / Template2

Perl Template Toolkit v2
http://template-toolkit.org/
145 stars 92 forks source link

The join vmethod for lists produces garbled output if its argument is utf8 #69

Open redneb opened 6 years ago

redneb commented 6 years ago

Consider the following template file (call it test.tmpl):

[% a=['φοο','βαρ']; a.join('•') %]
[% a=['foo','bar']; a.join('•') %]

and the following perl script that uses the above template:

#!/usr/bin/perl
use Template;
binmode STDOUT,'utf8';
Template->new(ENCODING=>'utf8')->process('test.tmpl');

The script produces the following output:

φοο•βαρ
fooâ¢bar

As you can see, in the second line the character gets garbled while in the first line is ok. If I define my own custom list vmethod that simply calls perl's join I get the correct output, i.e.:

φοο•βαρ
foo•bar
dracos commented 6 years ago

This only happens when using Template::Stash::XS, not Template::Stash (the bug is in the C code); it applies to any join characters immediately after any non-SvUTF8 flagged string until a SvUTF8 flagged string, so e.g. [% a=['foo','φοο','βαρ']; a.join('•') %] becomes foo•φοο•βαρ – the second join is fine because by then the string has had the SvUTF8 flag turned on.

The code at issue is https://github.com/abw/Template2/blob/4c602d0b9577ff87172a420607663cdb72146211/xs/Stash.xs#L1028-L1058 – I'm not an XS expert by any means, but I assume if the join string wasn't switched to a char* it wouldn't lose its UTF8 state. (It looks like perl 5.16 added a new flag to make this sort of thing easier: sv_catpvn_flags takes a couple of new internal-only flags, SV_CATBYTES and SV_CATUTF8 , which tell it whether the char array to be concatenated is UTF8. This allows for more efficient concatenation than creating temporary SVs to pass to sv_catsv . – dunno if that can be used here if it supports older perls but something like that.)