dankogai / p5-encode

Encode - character encodings (for Perl 5.8 or better)
https://metacpan.org/release/Encode
37 stars 51 forks source link

deep recursion in Encode::find_encoding when decoding bad MIME header #127

Closed ntyni closed 6 years ago

ntyni commented 6 years ago

As reported by Jakub Wilk in https://bugs.debian.org/880085

perl -MEncode -e 'Encode::decode("MIME-Header", "=?U".("_"x200)."?Q??=")'

gives the deep recursion warnings below on Perl 5.26.1, Encode 2.93 (and also 2.88 as bundled with 5.26.1.)

Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode/Alias.pm line 44. Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 152. Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 144. Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 144. Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode/Alias.pm line 44. Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 152. Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 144. Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode/Alias.pm line 44. Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 152. Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl5/5.26/Encode.pm line 144.

pali commented 6 years ago

This is not a problem in MIME-Header, but in Encode::find_encoding. Here is simple reproducer:

$ perl -MEncode -e 'Encode::find_encoding("U".("_"x200))'
Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode/Alias.pm line 46.
Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 130.
Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 112.
Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 114.
Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode/Alias.pm line 46.
Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 130.
Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 112.
Deep recursion on subroutine "Encode::find_encoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode/Alias.pm line 46.
Deep recursion on subroutine "Encode::getEncoding" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 130.
Deep recursion on subroutine "Encode::Alias::find_alias" at /usr/lib/x86_64-linux-gnu/perl/5.24/Encode.pm line 112.
pali commented 6 years ago

And in attachment is debug output from:

$ PERL_ENCODE_DEBUG=1 perl -MEncode -e 'Encode::find_encoding("U".("_"x200))'

debug.log

Which looks very strange... @dankogai Any idea what is this Encode::Alias::find_alias doing?

dankogai commented 6 years ago

Okay, got it. This is the offending regexp in Encode::Alias;

define_alias( qr/^(\S+)[\s_]+(.*)$/i => '"$1-$2"' );

Because S+ DOES MATCH _ each _ gets replaced one by one, causing unnecessary recursions.

And the fix is below:

--- a/lib/Encode/Alias.pm
+++ b/lib/Encode/Alias.pm
@@ -270,7 +270,7 @@ sub init_aliases {
     define_alias( qr/\bUTF-8$/i => '"utf-8-strict"' );

     # At last, Map white space and _ to '-'
-    define_alias( qr/^(\S+)[\s_]+(.*)$/i => '"$1-$2"' );
+    define_alias( qr/^([^\s_]+)[\s_]+([^\s_]*)$/i => '"$1-$2"' );
 }

 1;
dankogai commented 6 years ago

Pushed the fix. Closing. Thank you all for finding this!.