Kyokoning / lofter_crawler

crawler
46 stars 6 forks source link

爬取过程中出现问题中止 #1

Closed fsouls closed 4 years ago

fsouls commented 4 years ago

EXTRA: TIME_STAMP: True OUTPUT_DIR: output/sc TAG: [] TARGET: ARTICAL: True HOT_THRE: 50 PICTURE: True TAG_MINUS: [] TAG_PLUS: ['sc'] TITLE: [] TYPE: USER USER: [897737302] => The blog id of 897737302 is: 486931493 => start ! => NewBlog:40 TotalBlogs:40 http://897737302.lofter.com/post/1d05fc25_1c9a3fa45 http://897737302.lofter.com/post/1d05fc25_1c9986c87 Traceback (most recent call last): File "tool/crawler.py", line 36, in person_blog(cfg, str(user)) File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py", line 163, in person_blog _capture_blog(headers, url, hot, cfg) File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py", line 133, in _capture_blog f.write(html) UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 3411 : illegal multibyte sequence

请问这是什么原因,用的就是原配置文件

Kyokoning commented 4 years ago

你好,谢谢反馈!我刚看了下代码,出现这种问题的原因是windows平台在写入文件时会默认转码成gck,遇到某些gbk不支持的字符就会报错。修改代码之后这个问题应该得到解决了。