[BUG] - using hastag search seems cannot fetch all media data even using loop

zhangzyg commented 4 months ago

I want to search ukraine related video in America region, but seems can only fetch 30-50 records. But checked in Tiktok, has 7.1M records, could we download all, or is there anyway to search by time range

My code snipet

async def search_videos_hashtag(hashtag, time_from, time_to, current_video_amount=0, count=100, times=0) -> None: global result, api, current_os, result_tik_id_set format_style = '%m/%d/%y' if current_os == 'Windows' else '%Y/%m/%d' sleep(random.Random().randint(a=3, b=5)) temp = 0 temp_video_amount = current_video_amount if api is not None: if len(api.sessions) == 0: await api.create_sessions(ms_tokens=[ms_token], num_sessions=1, sleep_after=3, headless=False) #ms_token is None async for searchRes in api.hashtag(hashtag).videos(count=count, cursor=current_video_amount): temp += 1 current_video_amount += 1 time_to_add_one_day = int(( datetime.fromtimestamp(format_str_timestamp(time_to, format_style)) + timedelta(days=1)).timestamp()) if format_str_timestamp(time_from, format_style) <= searchRes.as_dict['createTime'] <= time_to_add_one_day \ and searchRes.id not in result_tik_id_set: author = construct_author_metadata(searchRes) publish = construct_publish_metadata(searchRes) author.append_publish(publish) result.append(author) result_tik_id_set.add(searchRes.id) print('append one tik tok data, current search: ' + str(current_video_amount)) if temp_video_amount == current_video_amount: sleep(random.Random().randint(a=3, b=5)) video_urls = list(map(lambda res: res.publish[0].link, result)) for url in video_urls: await search_related_videos(url, time_from, time_to, required_video_amount=count, current_video_amount=0, count=int(count / len(video_urls))) if temp < count and times < 100: await search_videos_hashtag(hashtag, time_from, time_to, current_video_amount, count, times=times + 1)

sameerahmedcls commented 3 months ago

can you send your full code

vagvalas commented 3 months ago

I can also confirm that this is a problem even before 6.4 (6.3.0) which could not pass beyond 45 videos.. now with 6.4 and later we can finally achieve a bigger amount (i had achieve 340 videos) but looping through the same videos again and again , and again (as the YouTube_dlp) which im passing the url fetched is constantly referring: already downloaded

here is my code:


from TikTokApi import TikTokApi
from yt_dlp import YoutubeDL
import asyncio
import os
from TikTokApi.exceptions import EmptyResponseException, TikTokException

ms_token = os.environ.get("multi_sids", "tjDG1O3i59WDpaK2v-spT5hmt1NcSJufT17v7cwvveTTqtYyq0N9mtAU-j76lfb7_msyycgSNt38AJVj2GF_KSxME27wc4C73eCVfSNsBs98TlO4PTOd2CEk7iRCm7kiFy7SPqKhUt33xvJ_LVtU")
ydl_opts = {
    'outtmpl': '%(uploader)s_%(id)s_%(timestamp)s.%(ext)s',
}

async def download_hashtag_videos(hashtag):
    async with TikTokApi() as api:
        try:
            await api.create_sessions(ms_tokens=[ms_token], num_sessions=1, sleep_after=3,
                                      headless=False, suppress_resource_load_types=["image", "media", "font", "stylesheet"])

            tag = api.hashtag(name=hashtag)
            more_videos = True
            while more_videos:
                videos = tag.videos(count=5000)
                video_list = []

                async for video in videos:
                    video_list.append(video)

                if not video_list:
                    more_videos = False
                    break

                for video in video_list:
                    print(f"Username: {video.author.username}")
                    print(f"Video ID: {video.id}")
                    print(f"Stats: {video.stats}")

                    video_url = f"https://www.tiktok.com/@{video.author.username}/video/{video.id}"
                    try:
                        with YoutubeDL(ydl_opts) as ydl:
                            ydl.download([video_url])
                    except Exception as e:
                        print(f"Error downloading video {video.id}: {e}")

        except EmptyResponseException as e:
            print(f"EmptyResponseException: {e}")
        except TikTokException as e:
            print(f"TikTokException: {e}")
        except Exception as e:
            print(f"An unexpected error occurred: {e}")

if __name__ == "__main__":
    hashtag = 'coldplayathens'
    asyncio.run(download_hashtag_videos(hashtag))    

TikTokApi: 6.5.2
Python 3.12
Playerlight: 1.39.00

vagvalas commented 3 months ago

Pass that it seems that it also fetched videos that it's not belong on the corresponding hashtag: https://www.tiktok.com/@tashawishesyouluck/video/7407303973583015201

For example, and its not even on hashtag 'coldplayathens'

davidteather / TikTok-Api

[BUG] - using hastag search seems cannot fetch all media data even using loop #1175