抓取的微博头条文章url为空

dataabc / weibo-crawler

新浪微博爬虫，用python爬取新浪微博数据，并下载微博图片和微博视频

3.35k stars 748 forks source link

抓取的微博头条文章url为空 #280

Open sushengbuhuo opened 2 years ago

sushengbuhuo commented 2 years ago

抓取的uid 5044429589（填了cookie），抓的excel里头条文章url都是空的，不知道什么问题？

dataabc commented 2 years ago

可能现在的代码还存在bug，我现在不方便调试，到时候再看看。

sushengbuhuo commented 2 years ago

改了下可以了

def get_article_url(self, selector):
        """获取微博中头条文章的url"""
        article_url = ''
        text = selector.xpath('string(.)')
        if text.startswith(u'发布了头条文章'):
            url = selector.xpath('//a/@href')
            if url and url[0].startswith('https://'):
                article_url = url[0]
        return article_url

dataabc commented 2 years ago

感谢反馈并给出了解决方法。如果方便，您可以以pull request的方式提交代码吗？这样你可以成为本项目的contributor，这不是强制的，如果不方便，我在以后自己修改。再次感谢。