dankogai / p5-encode

Encode - character encodings (for Perl 5.8 or better)
https://metacpan.org/release/Encode
37 stars 51 forks source link

Add "euc-cn" => "EUC-CN" alias to Encode::MIME::Name #124

Closed pypt closed 7 years ago

pypt commented 7 years ago

LWP::UserAgent uses IO::HTML for determining HTML page's encoding and calls mime_name() method of the returned Encode::Encoding object.

This fails with Chinese webpages encoded with GB2312. IO::HTML determined the encoding of said pages to be "euc-cn" but it's not mapped to "EUC-CN" in Encode::MIME::Name and so the subsequent content decoding fails.

To reproduce:

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10;

use LWP::UserAgent;

my $url = 'https://sandbox.pypt.lt/gb2312.html';

my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);

if ($response->is_success) {
    say "Charset: " . $response->content_charset(); # undef; should be "EUC-CN"
    say $response->decoded_content();   # garbled content; should be decoded Chinese text
} else {
    die $response->status_line;
}

A simple 'euc-cn' => 'EUC-CN' mapping fixes everything.

dankogai commented 7 years ago

Thank you!