Johnserf-Seed / f2

High-speed downloader for multiple platforms
https://johnserf-seed.github.io/f2/
Apache License 2.0
313 stars 61 forks source link

[BUG]fetch_user_post获取指定sec_uid的视频列表获取不全 #116

Open ganlnyn0000 opened 2 weeks ago

ganlnyn0000 commented 2 weeks ago

我使用最新版本,替换了test.yaml中的Cookie,下面是测试代码: async with DouyinCrawler(TestConfigManager.get_test_config("douyin")) as crawler: sec_uid = "MS4wLjABAAAARHbYBn84JChECWkdFOJ0r8t6jaxCS6VNSCGl4SpP0pE" params = UserProfile( sec_user_id=sec_uid, ) response = await crawler.fetch_user_profile(params) assert response, "Failed to fetch user profile" print(f"aweme_count: {response.get('user').get('aweme_count')}") max_cursor = 0 aweme_count = 0 p = 1 while True: params = UserPost( max_cursor=max_cursor, count=10, sec_user_id=sec_uid, ) response = await crawler.fetch_user_post(params) assert response, "Failed to fetch user post" print(f"page {p}: aweme_list count: {len(response.get('aweme_list'))}, has_more: {response.get('has_more')}") p = p+1 aweme_count = aweme_count+len(response.get('aweme_list'))

video = UserPostFilter(response)

            #video_id = video.aweme_id
            #print(video_id)
            if response.get('has_more')==0 or len(response.get('aweme_list'))==0:
                break
            max_cursor = response['max_cursor']
        print(f"fetch_user_post aweme_count: {aweme_count}")

下面是运行结果打印(由于数据太多没有打印详细结果): aweme_count: 241 page 1: aweme_list count: 9, has_more: 1 page 2: aweme_list count: 10, has_more: 1 page 3: aweme_list count: 2, has_more: 1 page 4: aweme_list count: 0, has_more: 1 fetch_user_post aweme_count: 21

这个sec_uid有241个视频,获取到第4页的时候aweme_list就为[]了,总共获取了21视频,我测试了10个sec_uid,有5个可以全部获取,有5个只能获取部分,和视频数量多少也没有关系,有的号有几千个视频也能全部获取 @Johnserf-Seed 请看下这是什么问题?

Johnserf-Seed commented 2 weeks ago

判断是否完整采集是根据字段内has_more这个参数来控制的,控制翻页的max_cursor参数是一个timestamp,因为这个作者在该时间段内没有发布过作品,所以你只需要根据has_more是否为0的条件来判断采集完毕,希望可以解答你的疑惑。@ganlnyn0000

ganlnyn0000 commented 1 week ago

@Johnserf-Seed 多谢!我再试试