NanmiCoder / MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
https://nanmicoder.github.io/MediaCrawler/
Other
17.54k stars 5.5k forks source link

抖音get_aweme_detail显示request params incrr,但是视频的评论成功抓取了。 #475

Open kayzh324 opened 1 week ago

kayzh324 commented 1 week ago

抖音的creator模式登录完成后直接显示account block崩出来,停了半个多月没抓,更新项目再跑还是blocked,死活找不到问题。

突发奇想试试detail模式,发现错误只出在DouYinCrawler.get_aweme_detail,抓评论的DouYinCrawler.get_comments是成功的。

非常迷茫毫无思路。

` (.venv) kayzh@debianLocal:~/MediaCrawler$ python main.py --platform dy --lt qrcode --type detail 2024-10-31 18:10:58 MediaCrawler INFO (db.py:73) - [init_db] start init mediacrawler db connect object 2024-10-31 18:10:58 MediaCrawler INFO (db.py:75) - [init_db] end init mediacrawler db connect object 2024-10-31 18:10:58 MediaCrawler INFO (proxy_ip_pool.py:59) - [ProxyIpPool._is_valid_proxy] testing 36.151.192.212 is it valid 2024-10-31 18:11:10 MediaCrawler INFO (login.py:116) - [DouYinLogin.login_by_qrcode] Begin login douyin by qrcode...

(eog:55849): Handy-WARNING **: 18:11:11.171: Using GtkSettings:gtk-application-prefer-dark-theme together with HdyStyleManager is unsupported. Please use HdyStyleManager:color-scheme instead. 2024-10-31 18:11:19 MediaCrawler INFO (login.py:70) - [DouYinLogin.begin] login finished then check login state ... 2024-10-31 18:11:40 MediaCrawler INFO (login.py:79) - [DouYinLogin.begin] Login successful then wait for 5 seconds redirect ... 2024-10-31 18:11:45 MediaCrawler ERROR (client.py:102) - request params incrr, response.text: 2024-10-31 18:11:45 MediaCrawler ERROR (core.py:150) - [DouYinCrawler.get_aweme_detail] Get aweme detail error: account blocked, 2024-10-31 18:11:46 MediaCrawler ERROR (client.py:102) - request params incrr, response.text: 2024-10-31 18:11:46 MediaCrawler ERROR (core.py:150) - [DouYinCrawler.get_aweme_detail] Get aweme detail error: account blocked, 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7303962837465432873, content: 用python练模型和测试啊,但部署肯定要用c++重写呀,不矛盾啊 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7281145506382168892, content: python的效率是我见过最低的,还有比python更低的吗 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7304186472210744104, content: C++的基础库真是少的可怜,有一次我想读取个csv文件,我不假思索写了个line.split(","),结果你猜怎么着,报错了[流泪][流泪][流泪] 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7308621761716781843, content: python 开发效率高,运行效率低。C++开发效率低,运行效率高。前者可以用高算力机器弥补,后者可以用高水平人力弥补。 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7281072117196522281, content: 咱又不干大数据商业化,python够用了呗 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7281079667602883382, content: 说白点,晚出现的语言底层都是c,c++,函数,套件齐全就是慢。c++不全要自己写,难得是写代码的时间。 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7314371957528527626, content: Python就是c++基础库的精简版 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7313937205062763291, content: 简单好用,工具多,社区环境好 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7313453315387360009, content: C是打工人,Python是老板 2024-10-31 18:11:46 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7282193842115347234, content: 建议取消所有语言,只留c,再难学也比一年换一个强 2024-10-31 18:11:47 MediaCrawler INFO (core.py:185) - [DouYinCrawler.get_comments] aweme_id: 7280854932641664319 comments have all been obtained and filtered ... 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7431204786026791739, content: 求源代码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7430467175989609268, content: 求源代码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7413519452623192868, content: 只想学游戏脚本该怎么学python 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7316485654572008202, content: 收费吗 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7298589399850156850, content: 求源码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7300272165453415222, content: 求源码[鼓掌][鼓掌][鼓掌] 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7319455921330897664, content: 求源码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7426388605792174882, content: 求源代码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7310162488493900579, content: 求源代码 2024-10-31 18:11:48 MediaCrawler INFO (init.py:106) - [store.douyin.update_dy_aweme_comment] douyin aweme comment: 7281223217397580559, content: 求源代码[流泪] 2024-10-31 18:11:49 MediaCrawler INFO (core.py:185) - [DouYinCrawler.get_comments] aweme_id: 7202432992642387233 comments have all been obtained and filtered ... 2024-10-31 18:11:49 MediaCrawler INFO (core.py:85) - [DouYinCrawler.start] Douyin Crawler finished ... 2024-10-31 18:11:49 MediaCrawler INFO (db.py:84) - [close] close mediacrawler db pool `

kayzh324 commented 6 days ago

请求的url地址如下:

https://www.douyin.com/aweme/v1/web/user/profile/other/?sec_user_id=MS4wLjABAAAACRSsERMR_yxehfJ2GJLsoJ23iRJ_Leaxued3w3eoo6tnLuJ5NBD-D9tZ7YWoqBFi&publish_video_strategy_type=2&personal_center_strategy=1&device_platform=webapp&aid=6383&channel=channel_pc_web&version_code=190600&version_name=19.6.0&update_version_code=170400&pc_client_type=1&cookie_enabled=true&browser_language=zh-CN&browser_platform=MacIntel&browser_name=Chrome&browser_version=125.0.0.0&browser_online=true&engine_name=Blink&os_name=Mac+OS&os_version=10.15.7&cpu_core_num=8&device_memory=8&engine_version=109.0&platform=PC&screen_width=2560&screen_height=1440&effective_type=4g&round_trip_time=50&webid=1461045111061134479&msToken=6CiVlBfuili28hMlRPP1-huLaLnDf-MX0NO5IT3ufhO2v2N4P1ZIVUyOXk7wNkpDvfc_5QknGBnwOjGO8ydyxnfrkqIehLXt9DT3uIxrzQB7b2B4oG1L54bB2rm762AUAUsoy_lCp8i7QdsxQVu3ycPwMh0_kJ1KmICQyDFTiexRe6TGOfQkqg%3D%3D&a_bogus=EfWZM5ukdDIiDfSX5IQLfY3q6VB3Ygs-0trEMD2fld3bYy39HMYM9exoW10v5GWjNT%2FdIeYjy4hbT3ohrQ2y8qwf9W0L%2F25gsDSkKl12so0j53inCLf%2FE0iE5hsAtFH8svr4iKi8owICSYyhldAJ5kIlO62-zo0%2F94f%3D

kayzh324 commented 6 days ago

经测试,抖音search模式完全正常;detail模式无法抓取视频数据,但可以抓取视频品论;creator模式完全不行。继续找问题。

Ja5onYng commented 6 days ago

我也遇到这个问题 creator模式爬几分钟之后开始就是account blocked