alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

Add support for sepecifying text encoding. #44

Closed RealXuChe closed 3 years ago

RealXuChe commented 3 years ago

I'm working with a legacy Chinese site with BIG5 text encoding, and I'm not able to set text encoding by passing arguments through request_args, because requests don't support it.

So the results I get was garbled, like this: '¡ ̧ÔÚÕâ ̧öÊÀ1⁄2ç ̧æÖÕÒÔÇ°©¤©¤A¡1-promise/result-'.

Encoding can only be set by writing to the encoding property of requests object (According to this).

So maybe adding an encoding param and set encoding in _get_soup in auto_scraper.py would be a good idea.

alirezamika commented 3 years ago

Yeah thanks. for now you can pass the html content via html parameter. can you share the website?

RealXuChe commented 3 years ago

Here's the website: https://www.wenku8.net/novel/2/2231/index.htm And, I've made a mistake, the encoding of this site is GB2312.

alirezamika commented 3 years ago

Should be fixed now. (v1.1.12)