Closed fsouls closed 4 years ago
EXTRA: TIME_STAMP: True OUTPUT_DIR: output/sc TAG: [] TARGET: ARTICAL: True HOT_THRE: 50 PICTURE: True TAG_MINUS: [] TAG_PLUS: ['sc'] TITLE: [] TYPE: USER USER: [897737302] => The blog id of 897737302 is: 486931493 => start ! => NewBlog:40 TotalBlogs:40 http://897737302.lofter.com/post/1d05fc25_1c9a3fa45 http://897737302.lofter.com/post/1d05fc25_1c9986c87 Traceback (most recent call last): File "tool/crawler.py", line 36, in person_blog(cfg, str(user)) File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py", line 163, in person_blog _capture_blog(headers, url, hot, cfg) File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py", line 133, in _capture_blog f.write(html) UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 3411 : illegal multibyte sequence
请问这是什么原因,用的就是原配置文件
你好,谢谢反馈!我刚看了下代码,出现这种问题的原因是windows平台在写入文件时会默认转码成gck,遇到某些gbk不支持的字符就会报错。修改代码之后这个问题应该得到解决了。
EXTRA: TIME_STAMP: True OUTPUT_DIR: output/sc TAG: [] TARGET: ARTICAL: True HOT_THRE: 50 PICTURE: True TAG_MINUS: [] TAG_PLUS: ['sc'] TITLE: [] TYPE: USER USER: [897737302] => The blog id of 897737302 is: 486931493 => start ! => NewBlog:40 TotalBlogs:40 http://897737302.lofter.com/post/1d05fc25_1c9a3fa45 http://897737302.lofter.com/post/1d05fc25_1c9986c87 Traceback (most recent call last): File "tool/crawler.py", line 36, in
person_blog(cfg, str(user))
File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py",
line 163, in person_blog
_capture_blog(headers, url, hot, cfg)
File "E:\Program files\lofter_crawler-master\tool..\lib\user_pigeonhole.py",
line 133, in _capture_blog
f.write(html)
UnicodeEncodeError: 'gbk' codec can't encode character '\u2022' in position 3411
: illegal multibyte sequence
请问这是什么原因,用的就是原配置文件