2023-01-26 12:49:59,376 - ERROR - weibo.py[:842] - 'NoneType' object has no attribute 'xpath'
Traceback (most recent call last):
File "D:\tmp\code\weibo-crawler\weibo.py", line 836, in get_one_weibo
weibo = self.parse_weibo(weibo_info)
File "D:\tmp\code\weibo-crawler\weibo.py", line 732, in parse_weibo
weibo["article_url"] = self.get_article_url(selector)
File "D:\tmp\code\weibo-crawler\weibo.py", line 633, in get_article_url
text = selector.xpath("string(.)")
AttributeError: 'NoneType' object has no attribute 'xpath'
当微博的文本内容为空时(json里
"mblog"."text": " "
),etree.HTML(text_body)
的返回值为None
,这会导致后续解析出错。报错信息:
例如:生日当天自动发的生日微博,其内容为空(json里
"mblog"."text": " "
):返回的原始json里
"mblog"."text": " "
,如下图:修复方式:在
空字符串
的末尾追加<hr>
,此时会变成有效的html字符串,会被正确解析并返回html对象,后续即可正常使用。而<hr>
是自结束的水平线
,不会影响正常的数据解析。