dataabc / weiboSpider

新浪微博爬虫,用python爬取新浪微博数据
8.37k stars 1.98k forks source link

如何在此工程的基础上爬取带关键词的微博,通过修改类weibo中的正则表达式 #258

Closed AprilWang711 closed 3 years ago

AprilWang711 commented 3 years ago

因为想爬取指定用户的带有关键词的微博,如何再次公测后给你基础上修改。请问是不是在类weibo 的content下加入正则表达式?请问具体应该怎么写

dataabc commented 3 years ago

是否指定关键词速度都是一样的,而且你也可以在结果文件或数据库中搜索该关键词,如果你想只爬取这些带关键词的,可以修改spider.py的get_weibo_info方法: 修改前

    def get_weibo_info(self):
        """获取微博信息"""
                    ...
                    if weibos:
                        yield weibos
                    ...

修改后

    def get_weibo_info(self):
        """获取微博信息"""
                    ...
                    key_word = '关键词'
                    weibos = [w for w in weibos if key_word in w.content]
                    if weibos:
                        yield weibos
                    ...
stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rever67697 commented 3 years ago

👍

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

Closing as stale, please reopen if you'd like to work on this further.