VijayQin / DMHY-spider

This project aims at crawling on DMHY, and store the html and torrent of each animation in our local file systems and in database(SQLite3). The future work will be filtering animation we want by the given rules and alert us to those updated every day.
14 stars 2 forks source link

UnicodeEncodeError is thrown because of windows cmd 'gbk' codec problem #5

Closed VijayQin closed 8 years ago

VijayQin commented 8 years ago

When downloading page(19:35 Aug 02 2016) https://share.dmhy.org/topics/view/438777_160727_TV_Vol_2_320K.html a UnicodeEncodeError is thrown complete error message is following: 已完成:186/204 Traceback (most recent call last): File "DMHY_DataBase.py", line 357, in DataBase.start_requests() File "DMHY_DataBase.py", line 172, in start_requests self.parse_item(update_list[i], con) File "DMHY_DataBase.py", line 218, in parse_item print u"[姝e湪涓嬭浇] " + item_title UnicodeEncodeError: 'gbk' codec can't encode character u'\u30fb' in position 28: illegal multibyte sequence image

VijayQin commented 8 years ago

It's fixed at 2206 Aug 08 16 By adding try……except…… for example: try: print u"[正在下载] " + item_title except Exception, e: print (u"[正在下载] " + item_title).encode("GBK", 'ignore')