dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.37k stars 1.98k forks source link

获取自身微博信息 #113

Closed Hylan129 closed 4 years ago

Hylan129 commented 4 years ago

在抓取自己微博历史记录时,原程序一直抓不到,程序本身没有报错;抓取其他人的微博正常。

分析后发现微博地址更改下可用,在如下两处url后面增加“/profile",即可。 1: def get_weibo_info(self): """获取微博信息""" try: url = 'https://weibo.cn/%s/profile' % (self.user_config['user_uri']) 2: def get_one_page(self, page): """获取第page页的全部微博""" try: url = 'https://weibo.cn/%s/profile?page=%d' % ( self.user_config['user_uri'], page)

ps:更改成新地址后,抓取其他人的微博同样可用。

dataabc commented 4 years ago

感谢反馈。

非常好的建议,但是与现在的部分功能冲突。现在user_id即可以是真实的用户id,也可以是个性域名,如胡歌的微博页是https://weibo.cn/hu_ge,其中“hu_ge”就是个性域名。添加“/profile”后,如果user_id写的是真实的id可以正确获取信息,但是如果写的是个性域名,就会获取失败。考虑到很多微博是个性域名形式,为了更好的扩展性,程序暂时不作修改。

再次感谢,如果发现其它问题,欢迎继续反馈:smile:

purplepalmdash commented 4 years ago

抓取不到图片的解决方法如下, 供参考

 440             #first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=0'
 441             first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=1'
Hylan129 commented 4 years ago

@dataabc 嗯好的。个人问题自己已经解决,感谢回复。

Hylan129 commented 4 years ago

抓取不到图片的解决方法如下, 供参考

 440             #first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=0'
 441             first_pic = 'https://weibo.cn/mblog/pic/' + weibo_id + '?rl=1'

@purplepalmdash 感谢,已解决!

Yuuoniy commented 4 years ago

请问以上提到的两处url在代码文件的哪里呢?在 spider.py 看到 get_weibo_info 函数,但是看不到 url 的赋值。

songzy12 commented 4 years ago

@Yuuoniy 你好,url 的构建当前都在 parser 模块下: https://github.com/dataabc/weiboSpider/tree/master/weibo_spider/parser

每个 parser 对应了一类相关 url.

songzy12 commented 4 years ago

@Yuuoniy 更具体一点,在这里:

https://github.com/dataabc/weiboSpider/blob/master/weibo_spider/parser/mblog_picAll_parser.py#L8

scriptway commented 3 years ago

请问以上提到的两处url在代码文件的哪里呢?在 spider.py 看到 get_weibo_info 函数,但是看不到 url 的赋值。

程序已经更新,URL的引用变了。我也遇到这个问题,parser目录下的index_parser, info_parser, page_parser 里的url相关地址我都加上profile 可是依然无法解析个人微博。看程序报错是xpath匹配不到数据 ,我有一些微博是仅自己可见的,但是分析微博的页面结构后发现仅自己可见的微博div和人的微博div并没有什么不同,不知道出错环节在哪里 报错信息如下

``list index out of range Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/parser/info_parser.py", line 39, in extract_user_info if self.selector.xpath( IndexError: list index out of range 'NoneType' object has no attribute 'id'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/parser/index_parser.py", line 36, in get_user self.user.id = user_id AttributeError: 'NoneType' object has no attribute 'id' None


'NoneType' object has no attribute 'nickname'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath self.user.nickname) AttributeError: 'NoneType' object has no attribute 'nickname' expected str, bytes or os.PathLike object, not NoneType

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/csv_writer.py", line 25, in init with open(self.file_path, 'a', encoding='utf-8-sig', TypeError: expected str, bytes or os.PathLike object, not NoneType 'NoneType' object has no attribute 'nickname'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath self.user.nickname) AttributeError: 'NoneType' object has no attribute 'nickname' 'NoneType' object has no attribute 'nickname'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath self.user.nickname) AttributeError: 'NoneType' object has no attribute 'nickname' 'NoneType' object has no attribute 'nickname'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 188, in _get_filepath self.user.nickname) AttributeError: 'NoneType' object has no attribute 'nickname' 'NoneType' object has no attribute 'dict'

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 269, in start self.write_user(self.user) File "/usr/local/lib/python3.9/site-packages/weibo_spider/spider.py", line 114, in write_user writer.write_user(user) File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/txt_writer.py", line 29, in write_user [v + ':' + str(self.user.dict[k]) for k, v in self.user_desc]) File "/usr/local/lib/python3.9/site-packages/weibo_spider/writer/txt_writer.py", line 29, in [v + ':' + str(self.user.dict[k]) for k, v in self.user_desc]) AttributeError: 'NoneType' object has no attribute 'dict' ``

dataabc commented 3 years ago

@scriptway 是因为速度太快,被暂时限制了。要降低速度,按照常见问题的问题2修改就可以了。