david-dick / firefox-marionette

This is a client module to automate the Mozilla Firefox browser via the Marionette protocol
https://metacpan.org/dist/Firefox-Marionette
Other
12 stars 3 forks source link

json method fails on documents with non-ASCII characters #6

Closed eserte closed 3 years ago

eserte commented 3 years ago

The following script tries to fetch and decode a JSON document, but fails:

#!/usr/bin/env perl

use strict;
use warnings;
use v5.10.0;
use Firefox::Marionette;
use Data::Dumper;

my $fm = Firefox::Marionette->new;
$fm->go("https://eprel.ec.europa.eu/api/products/dishwashers2019/543834");
say Dumper($fm->json);
#my $json = $fm->strip; utf8::encode($json); $json = JSON::XS::decode_json($json); say Dumper($json); # workaround for the problem

__END__

Output is:

malformed UTF-8 character in JSON string, at character offset 193 (before "\x{fffd}","postalCod...") at /opt/perl-5.30.3/lib/site_perl/5.30.3/Firefox/Marionette.pm line 6391.

The problem seems to be that the document content is available in characters, but JSON::XS requires that the input is in octets. So it works well unless there are "wide characters" in the input. Explicitly transforming the characters into utf-8 octets, either with utf8::encode or another function (Encode::str2bytes would also work) fixes the problem, and should probably be built into the json() method.

david-dick commented 3 years ago

I've decided to put the encoding into the strip method as it assumes UTF-8 encoding as well. Thanks for all the comments and bug reports. Most appreciated. I'm planning on a new release in a week or so.