HohnerJulian / ResearchTikPy

Python API wrapper for the TikTok Research API
MIT License
14 stars 1 forks source link

Questions about using TikTok Research API #2

Open scn0901 opened 2 months ago

scn0901 commented 2 months ago

I encountered some weird things when using the TikTok Research API (note: I wrote the python code myself, not using your repo). I would like to ask if you have also encountered these problems to make sure that these problems are not due to possible bugs of my code.

  1. I can’t collect liked videos (regardless of whether the user has set it to private or not).
  2. I can only collect reposted videos of some users (I’m not sure if the reason for this is because some users have set it to private).
  3. For user followers, I only seem to be able to get no more than 5,000 followers of one user using API, even though some users actually has tens of thousands of followers (as seen on the user profile).

Looking forward to your answer, thank you!

gioda commented 2 months ago

I've developed code similar to this repository for use with the TikTok Research API and have encountered issues akin to those mentioned by @scn0901, particularly with point 3. Despite being able to view all followers on the TikTok platform, the API only returns data for followers with public accounts. Additionally, I am unable to retrieve more than approximately 4,000 to 5,000 followers for accounts that have millions of followers. I have attempted to contact the TikTok developer team for clarification but have not yet received a response.

Any assistance or further information on this matter would be greatly appreciated.

HohnerJulian commented 1 month ago

Hi there! Sorry for taking so much time to respond.

  1. Collecting liked videos: I tested the function again, and it works in my library. I used data from accounts that appear in this list: https://www.tiktok.com/discover/best-accounts-for-liked-videos?lang=de-DE.

It's hard to help you without code snippets, but my guess is that this might be due to an error in your code, or perhaps because the person you are trying to collect data from is outside the US or EU?

  1. Irregularities in search queries: I also discovered irregularities regarding the complete collection of all videos of one user, or for example, when collecting videos by hashtag. The FAQ of TikTok states: The User info API only retrieves data for an individual user, so we use online data. However, the video query API searches for the full dataset, so we use archived data instead of the current online data. New videos take up to 48 hours to be added to the search engine, and statistics such as view count and follower count can take up to 10 days to update.

This might be one reason for the issue, but I agree with you that there might also be different problems on the API side.

  1. Only 5000 followers: That's also true for me, and I will add that to the documentation of ResearchTikPy. I don't know the reason behind this, but you actually only get the same amount of followers if you try to scrape the followers. If you scroll through the followers list of one account, you will encounter that the pagination stops at approximately 5000 followers. Apparently, TikTok currently only lists up to 5000 followers in their endpoint. I will open a ticket on this matter on the TikTok Support site.

Sorry if this was not that helpful, but I could try to help you if you send me your code regarding 1..

scn0901 commented 1 month ago

That's okay, your answer was very detailed and helpful! Especially 3 "Only 5000 followers", which helped me determine that the problem was not due to my code.

Regarding my problem 1, this is the main function I wrote. The global variables mentioned in this function have been initialized before use. If you need any more details, please let me know and I'll be happy to provide them.

def get_liked_videos_of_a_user(username: str, cursor: int = int(time.time())) -> Tuple[List, int]:
    '''
    Given a username and its start cursor, retrieve all liked videos of that user, as well as -1 (means completed) or last_cursor (means the last cursor for recovery).

    Parameters:
    username: str, the name of the user.
    cursor: int = int(time.time()), start cursor.

    Return:
    result: Tuple[List, int], containing (liked_videos_all, -1 / last_cursor).
                liked_videos_all: List, all data collected, ['forbidden'] if API access is forbidden.
                -1 / last_cursor: int, -1 means completed, last_cursor means the last cursor for recovery.
    '''

    global ACCESS_TOKEN  # set global variable ACCESS_TOKEN, as we may need to refresh it if it is invalid

    global CURRENT_REQUESTS_COUNT, CURRENT_TRIALS_COUNT  # set global variable CURRENT_REQUESTS_COUNT, CURRENT_TRIALS_COUNT, we need them to count current requests and trials

    CURRENT_TRIALS_COUNT = 0  # CURRENT_TRIALS_COUNT reset to 0 at the beginning

    liked_videos_all = []  # use liked_videos_all to store all data collected

    last_cursor = cursor  # set the initial value of last_cursor

    url = 'https://open.tiktokapis.com/v2/research/user/liked_videos/?fields=id,create_time,username,region_code,video_description,music_id,like_count,comment_count,share_count,view_count,hashtag_names'

    headers = {
        'Authorization': f'Bearer {ACCESS_TOKEN}',
    }

    request_body = {
        'username': username,
        'max_count': 100,
        'cursor': last_cursor,
    }

    while True:  # continue sending request to retrieve all liked videos of that user

        # if CURRENT_REQUESTS_COUNT reach MAX_REQUESTS_COUNT, return liked_videos_all with last_cursor
        if CURRENT_REQUESTS_COUNT == MAX_REQUESTS_COUNT:
            return (liked_videos_all, last_cursor)

        # if CURRENT_TRIALS_COUNT reach MAX_TRIALS_COUNT, return liked_videos_all with -1
        if CURRENT_TRIALS_COUNT == MAX_TRIALS_COUNT:
            return (liked_videos_all, -1)

        try:  # try to get json response
            CURRENT_REQUESTS_COUNT += 1  # CURRENT_REQUESTS_COUNT add 1
            CURRENT_TRIALS_COUNT += 1  # CURRENT_TRIALS_COUNT add 1
            response = requests.post(url, headers=headers, json=request_body)  # send request and get response
            response_json = response.json()  # change response into json format
        except:  # if fails, sleep 60 seconds, then continue
            time.sleep(60)
            continue

        # process json response according to different situations

        # situation 1 (ok): CURRENT_TRIALS_COUNT reset to 0, collect current data (if fails, return liked_videos_all with -1), update last cursor
        if response_json['error']['code'] == 'ok':
            CURRENT_TRIALS_COUNT = 0
            try:
                liked_videos_all.extend(response_json['data']['user_liked_videos'])
            except:
                return (liked_videos_all, -1)
            last_cursor = response_json['data']['cursor']
            # situation 1-1 (has more data): set new cursor in request_body
            if response_json['data']['has_more']:
                request_body['cursor'] = last_cursor
            # situation 1-2 (has no more data): return liked_videos_all with -1
            else:
                return (liked_videos_all, -1)

        # situation 2 (access_token_invalid): refresh access token and headers
        elif response_json['error']['code'] == 'access_token_invalid':
            ACCESS_TOKEN = get_client_access_token(CLIENT_KEY, CLIENT_SECRET)
            headers['Authorization'] = f'Bearer {ACCESS_TOKEN}'

        # situation 3 (daily_quota_limit_exceeded): return liked_videos_all with last_cursor
        elif response_json['error']['code'] == 'daily_quota_limit_exceeded':
            return (liked_videos_all, last_cursor)

        # situation 4 (forbidden): return ['forbidden'] with -1
        elif response_json['error']['code'] == 'forbidden':
            return (['forbidden'], -1)

        # situation 5 (other errors): sleep 60 seconds
        else:
            time.sleep(60)

Regarding my problem 2, I would like to ask if you have used API to collect user reposted videos (although ResearchTikPy doesn't have this feature now)? If so, have you found that when you use the API to query user reposted videos, no data is returned for many users (I have observed that this situation accounts for more than 80% of my total users)? And do you know the reason for this situation (for example: users have not actually reposted, users set it to private, API restriction, etc.)?

What I know is: When I checked the homepages of some users for who the API didn't return reposted videos, I found that the homepages (in mobile APP) of most users did not even have a "repost" tab, while a small number of users had a "repost" tab but no reposted videos in it. "repost" seems to be a new feature of TikTok, and some users on TikTok have complained inconsistency in their use (such as the comments of https://www.tiktok.com/@metricoolapp/video/7197228299922263302), but I cannot find anything related to "repost" in TikTok help center.

Thank you for your help!