dataabc / weibo-crawler

新浪微博爬虫,用python爬取新浪微博数据,并下载微博图片和微博视频
3.29k stars 745 forks source link

LivePhoto 没有爬取到,已经添加了cookie #448

Open hwangzhun opened 1 month ago

hwangzhun commented 1 month ago

2024-07-11 19:44:02,082 - ERROR - 'large' Traceback (most recent call last): File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 884, in get_one_weibo weibo = self.get_long_weibo(weibo_id) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 443, in get_long_weibo weibo = self.parse_weibo(weibo_info) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 785, in parse_weibo weibo["pics"] = self.get_pics(weibo_info) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\huang\Desktop\weibo-crawler-master\weibo.py", line 451, in get_pics pic_list = [pic["large"]["url"] for pic in pic_info]


KeyError: 'large'

看日志貌似这里有问题?
dataabc commented 1 month ago

看起来没有获取到large的数据,应该是和图片相关,和LivePhoto没关系。我现在无法调试,不确定什么情况,您可以更换cookie,也许它失效了,不确定。

hwangzhun commented 1 month ago

试了更换 cookie,也去验证了 cookie 的有效性,还是无法将 livephoto 下载下来,我想应该不是 cookie 是问题,我尝试在源码里 get_pics 函数打印获取到的数据发现是有爬取到 livephoto链接,链接打开是有效可以看到livepohot的。(我是个小白,请大神指点一下,谢谢)

打印获取到的代码: { "visible": { "type": 0, "list_id": 0 }, "mark": "followtopweibo", "created_at": "Mon May 24 20:26:57 +0800 2021", "id": "4640476257583704", "mid": "4640476257583704", "can_edit": false, "photoTag": [ { "picid": "008gLOvsly1gqtsew42bvj31qz33y4qu", "hastag": true, "taginfo": { "code": "100000", "msg": "", "data": { "4640476262170935": { "pic_object_id": "1042018:85eccedb7467629dcfb61725e28a599f", "photo_id": 4640476262170935, "mid": 4640476257583704, "pid": "008gLOvsly1gqtsew42bvj31qz33y4qu", "uid": 7576879598, "pic_tags": [ { "tag_uid": 7576879598, "tag_id": "1022: 2315222e5145e3750ea7d718df96d397e48f3d", "tag_name": "万达广场", "tag_type": "search_topic", "pos_x": "0.74931506849315", "pos_y": "0.39719900744417", "dir": 1, "url": "https://s.weibo.com/pic/%23%E4%B8%87%E8%BE%BE%E5%B9%BF%E5%9C%BA%23", "mobile_url": "sinaweibo://searchall?containerid=231522&q=%23%E4%B8%87%E8%BE%BE%E5%B9%BF%E5%9C%BA%23&isnewpage=1" } ] } } } }, { "picid": "008gLOvsly1gqtsb9tt57j322i2xye8b", "hastag": true, "taginfo": { "code": "100000", "msg": "", "data": { "4640476262433251": { "pic_object_id": "1042018: 77c74d742c7de259e8c6176cb5532cd8", "photo_id": 4640476262433251, "mid": 4640476257583704, "pid": "008gLOvsly1gqtsb9tt57j322i2xye8b", "uid": 7576879598, "pic_tags": [ { "tag_uid": 7576879598, "tag_id": "1022: 231522ab45d20b9d1f082f439923af4210ea0a", "tag_name": "一番街", "tag_type": "search_topic", "pos_x": "0.027397260273973", "pos_y": "0.30979226423294", "dir": 2, "url": "https://s.weibo.com/pic/%23%E4%B8%80%E7%95%AA%E8%A1%97%23", "mobile_url": "sinaweibo://searchall?containerid=231522&q=%23%E4%B8%80%E7%95%AA%E8%A1%97%23" } ] } } } } ], "text": "先变成自己喜欢的样子,再去遇见无需取悦的人 <a href=\"https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E9%9A%8F%E6%89%8B%E6%8B%8D%23\" data-hide=\"\"><span class=\"surl-text\">#随手拍# <a href=\"https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E4%BB%8A%E5%A4%A9%E7%A9%BF%E4%BB%80%E4%B9%88%23&isnewpage=1\" data-hide=\"\"><span class=\"surl-text\">#今天穿什么# <a href=\"https://m.weibo.cn/search?containerid=231522type%3D1%26t%3D10%26q%3D%23%E5%A4%8F%E5%A4%A9%23&isnewpage=1\" data-hide=\"\"><span class=\"surl-text\">#夏天# <a href=\"http://weibo.com/p/100101B2094550D56AABFA499A\" data-hide=\"\"><span class=\"surl-text\">肇庆·广东理工学院鼎湖校区 ", "textLength": 88, "source": "iPhone客户端", "favorited": false, "pic_ids": [ "008gLOvsly1gqts81gy3ej324c2rungw", "008gLOvsly1gqts8dp4fuj31pw2hr1kz", "008gLOvsly1gqtsbcpfh8j32502upe1l", "008gLOvsly1gqtsb9tt57j322i2xye8b", "008gLOvsly1gqtsa9c9gqj328y31che8", "008gLOvsly1gqtsew42bvj31qz33y4qu" ], "thumbnail_pic": "https://wx1.sinaimg.cn/thumbnail/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "bmiddle_pic": "http://wx1.sinaimg.cn/bmiddle/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "original_pic": "https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "is_paid": false, "mblog_vip_type": 0, "user": { "id": 7576879598, "screen_name": "小Miki喵", "profile_image_url": "https://tvax4.sinaimg.cn/crop.0.0.1080.1080.180/008gLOvsly8hqwc8k66b5j30u00u0n0q.jpg?KID=imgbed,tva&Expires=1720766130&ssig=LudkCXVJHV", "profile_url": "https://m.weibo.cn/u/7576879598?", "close_blue_v": false, "description": "抖y:小MIKI", "follow_me": false, "following": true, "follow_count": 103, "followers_count": "1627", "cover_image_phone": "https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg", "avatar_hd": "https://wx4.sinaimg.cn/orj480/008gLOvsly8hqwc8k66b5j30u00u0n0q.jpg", "badge": { "user_name_certificate": 1, "city_university": 19 }, "statuses_count": 358, "verified": false, "verified_type": -1, "gender": "f", "mbtype": 11, "svip": 1, "urank": 0, "mbrank": 1, "followers_count_str": "1627", "verified_reason": "", "like": false, "like_me": false, "special_follow": false }, "can_remark": true, "reposts_count": 0, "comments_count": 8, "reprint_cmt_count": 0, "attitudes_count": 14, "mixed_count": 0, "pending_approval_count": 0, "isLongText": false, "show_mlevel": 0, "expire_time": 1624071669, "ad_state": 1, "darwin_tags": [], "ad_marked": false, "mblogtype": 1, "item_category": "status", "rid": "5_0_50_162659327855539511_0_0_0", "extern_safe": 0, "number_display_strategy": { "apply_scenario_flag": 19, "display_text_min_number": 1000000, "display_text": "100万+" }, "content_auth": 0, "is_show_mixed": false, "comment_manage_info": { "comment_permission_type": -1, "approval_comment_type": 0, "comment_sort_type": 0 }, "pic_num": 6, "mlevel": 0, "mblog_menu_new_style": 0, "page_info": { "type": "place", "icon": "https://h5.sinaimg.cn/upload/2016/03/15/196/timeline_icon_location_default.png", "page_pic": { "url": "https://wx2.sinaimg.cn/wap180/82ef82cbly1fjihaypsj2j205k05kwg1.jpg", "width": "88", "height": "88" }, "pageurl": "https://m.weibo.cn/p/index?containerid=1008089b01eaf54b1a61c904fbb7e053cf64e0-_lbs&lcardid=frompoi&extparam=frompoi", "page_title": "肇庆·广东理工学院鼎湖校区", "content1": "坑口金鼎路(庆云大道)", "content2": "2801人来过 8152条微博 6185张图片" }, "pics": [ { "pid": "008gLOvsly1gqts81gy3ej324c2rungw", "url": "https://wx1.sinaimg.cn/orj360/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "size": "orj360", "geo": { "width": 360, "height": 470, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "geo": { "width": 2048, "height": 2678, "croped": false } }, "videoSrc": "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003db0tbjx07MULutRao0f0f0100dBXr0k01.mov", "type": "livephoto" }, { "pid": "008gLOvsly1gqts8dp4fuj31pw2hr1kz", "url": "https://wx3.sinaimg.cn/orj360/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg", "size": "orj360", "geo": { "width": 360, "height": 521, "croped": false }, "large": { "size": "large", "url": "https://wx3.sinaimg.cn/large/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg", "geo": { "width": 2048, "height": 2969, "croped": false } } }, { "pid": "008gLOvsly1gqtsbcpfh8j32502upe1l", "url": "https://wx2.sinaimg.cn/orj360/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg", "size": "orj360", "geo": { "width": 360, "height": 480, "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg", "geo": { "width": 2048, "height": 2731, "croped": false } }, "videoSrc": "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003EY6fWjx07MULts9680f0f01008pOo0k01.mov", "type": "livephoto" }, { "pid": "008gLOvsly1gqtsb9tt57j322i2xye8b", "url": "https://wx2.sinaimg.cn/orj360/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg", "size": "orj360", "geo": { "width": 360, "height": 511, "croped": false }, "large": { "size": "large", "url": "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg", "geo": { "width": 2048, "height": 2912, "croped": false } } }, { "pid": "008gLOvsly1gqtsa9c9gqj328y31che8", "url": "https://wx1.sinaimg.cn/orj360/008gLOvsly1gqtsa9c9gqj328y31che8.jpg", "size": "orj360", "geo": { "width": 360, "height": 486, "croped": false }, "large": { "size": "large", "url": "https://wx1.sinaimg.cn/large/008gLOvsly1gqtsa9c9gqj328y31che8.jpg", "geo": { "width": 2048, "height": 2766, "croped": false } } }, { "pid": "008gLOvsly1gqtsew42bvj31qz33y4qu", "url": "https://wx3.sinaimg.cn/orj360/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg", "size": "orj360", "geo": { "width": 360, "height": 639, "croped": false }, "large": { "size": "large", "url": "https://wx3.sinaimg.cn/large/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg", "geo": { "width": 2048, "height": 3640, "croped": false } } } ], "live_photo": [ "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003db0tbjx07MULutRao0f0f0100dBXr0k01.mov", "https://video.weibo.com/media/play?livephoto=https%3A%2F%2Flivephoto.us.sinaimg.cn%2F003EY6fWjx07MULts9680f0f01008pOo0k01.mov" ], "bid": "KgYYhvORO", "pic_list": [ "https://wx1.sinaimg.cn/large/008gLOvsly1gqts81gy3ej324c2rungw.jpg", "https://wx3.sinaimg.cn/large/008gLOvsly1gqts8dp4fuj31pw2hr1kz.jpg", "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsbcpfh8j32502upe1l.jpg", "https://wx2.sinaimg.cn/large/008gLOvsly1gqtsb9tt57j322i2xye8b.jpg", "https://wx1.sinaimg.cn/large/008gLOvsly1gqtsa9c9gqj328y31che8.jpg", "https://wx3.sinaimg.cn/large/008gLOvsly1gqtsew42bvj31qz33y4qu.jpg" ] } 貌似这里返回的 图片地址 不对?我打开返回403

dataabc commented 1 month ago

看上面的内容,live photo信息在live_photo后面,应该修改weibo.py的get_live_photo方法,获取live_photo后面的内容。图片问题参考https://github.com/dataabc/weibo-search/issues/473 。

hwangzhun commented 1 month ago

感谢大佬。 把 get_live_photo 方法修改成这样,下载成功

    def get_live_photo(self, weibo_info):
        """获取live photo中的视频url"""
        live_photo_list = weibo_info.get("live_photo", [])
        return live_photo_list
xiaomeng758 commented 1 month ago

是直接把整个函数都改成只有这三行吗?

hwangzhun commented 1 month ago

是直接把整个函数都改成只有这三行吗?

def 是定义函数的不能删除,把定义函数下面的代码替换成这两行

xiaomeng758 commented 1 month ago

ac0fa2e5e77b8443441bfecf7de0970f 这样吗

hwangzhun commented 1 month ago

ac0fa2e5e77b8443441bfecf7de0970f 这样吗

对的