dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.44k stars 1.98k forks source link

line 1: b'ID C_5074630370399522 already defined' (line 1) #602

Open cinyearchan opened 2 months ago

cinyearchan commented 2 months ago

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

答:pypi 版

答:是

答:否

答:

答: user_id 2492465520 since_date 2009-08-28 end_date now usesr_id_list.txt 2492465520 刘晓光_恶魔奶爸 2024-08-22 10:21

答:单次爬取过程中出现多次提示:

line 1: b'ID C_5074630370399522 already defined' (line 1)
Traceback (most recent call last):
  File "/Users/xxx/.pyenv/versions/3.9.7/lib/python3.9/site-packages/weibo_spider/parser/util.py", line 42, in handle_html
    selector = etree.HTML(resp.content)
  File "src/lxml/etree.pyx", line 3170, in lxml.etree.HTML
  File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1765, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1127, in lxml.etree._BaseParser._parseDoc
  File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 649, in lxml.etree._raiseParseError
  File "<string>", line 1
lxml.etree.XMLSyntaxError: line 1: b'ID C_5074630370399522 already defined'
dataabc commented 2 months ago

我现在没法调试,您可以参考https://www.mail-archive.com/lxml@python.org/msg00213.html