ThomasAitken / Scrapy-Testmaster

The most advanced debugging and testing tool for Scrapy
Other
16 stars 4 forks source link

Trying to inspect fixture.bin with cp1252 encoding throws a UnicodeDecodeError #10

Open mathvaillant opened 1 year ago

mathvaillant commented 1 year ago

I have a fixture with the cp1252 encoding type, when I try to run: testmaster inspect <spider> <callback> 1 it throws the error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1051: invalid continuation byte

This happens because the file I am trying to inspect is not a UTF-8.

I solved the issue locally by adding an optional argument when running the inspect:

cli.py/main - line 363
...
inspect_cmd.add_argument(
        'encoding', 
        nargs='?', 
        default='utf-8', 
        help="Encoding of the fixtures file. Defaults to 'utf-8'."
    )
....

Then the parse_data, get_fixture_data and inspect as follows:

def parse_data(self, data):
        if isinstance(data, (dict, scrapy.Item)):
            return {
                self.parse_data(k): self.parse_data(v)
                for k, v in data.items()
            }
        elif isinstance(data, list):
            return [self.parse_data(x) for x in data]
        elif isinstance(data, bytes):
            return to_unicode(data, encoding=self.args.encoding) <- Error was happening in here as the default was going back to utf-8.
        elif isinstance(data, datetime):
            return data.isoformat()
        elif isinstance(data, (int, float)):
            return data
        return str(data)

    def get_fixture_data(self):
        with open(self.fixture_path, 'rb') as f:
            raw_data = f.read()

        encoding = self.args.encoding
        fixture_info = unpickle_data(decompress_data(raw_data), encoding)

        if 'fixture_version' in fixture_info:
            data = unpickle_data(fixture_info['data'], encoding)
        else:
            data = fixture_info  # legacy tests (not all will work, just utf-8)
        return data

    def inspect(self):
        data = self.parse_data(self.get_fixture_data())
        print(json.dumps(data))

And then on the terminal I just had to run:

testmaster inspect <spider> <callback> 1 cp1252 and it worked just fine.

Not sure if this is the best solution, but something like that would be super helpful!

Btw I am really enjoying working with testmaster 👌

ricardocouto-hydradev commented 1 year ago

Any news about this?