MickaelWalter / wp-json-scraper

Scrapes WordPress data using the WP-JSON API activated by default since WordPress 4.7
MIT License
97 stars 26 forks source link

Json Decoder Error #2

Closed serhatgksu closed 4 years ago

serhatgksu commented 4 years ago

How can I solve this ?

python .\WPJsonScraper.py -i "https://guvenliksistemleri.net"

Traceback (most recent call last):
  File ".\WPJsonScraper.py", line 365, in <module>
    main()
  File ".\WPJsonScraper.py", line 231, in main
    basic_info = scanner.get_basic_info()
  File "C:\Users\User\Desktop\wp-json-scraper-master\wp-json-scraper-master\lib\wpapi.py", line 89, in get_basic_info
    self.basic_info = req.json().read().decode('utf-8-sig')
  File "C:\python3\lib\site-packages\requests\models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "C:\python3\lib\json\__init__.py", line 337, in loads
    raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
json.decoder.JSONDecodeError: Unexpected UTF-8 BOM (decode using utf-8-sig): line 1 column 1 (char 0)
MickaelWalter commented 4 years ago

Hi @serhatgksu,

It is as Python tells you: the website you are trying to access sends BOM (a preamble for UTF-8) and the hardcoded encoding assumed by wp-json-scraper does not expect BOM.

See here about BOM: https://en.wikipedia.org/wiki/Byte_order_mark

I assume this to be a bug in my code because of the fact I assume that the document returned by WordPress is plain UTF-8 without BOM (which is true is most cases). I'll look for a more proper way to handle this in a few hours. Keep an eye on the master branch.

MickaelWalter commented 4 years ago

This should work fine now on master, including with UTF-8 documents including BOM.

Could you confirm? Thank you.