Closed yjqian02 closed 2 years ago
I think TikTok deprecated the search for hashtags functionality (or moved it). The entire endpoint is gone / returning 404: https://www.tiktok.com/tag/
I already have an issue over at the R package repo: https://github.com/JBGruber/traktok/issues/4
So the endpoint is not gone, I just checked. When I look at the following API URL, it delivers the expected data. This means TikTok has changed the required parameters to deliver a valid response, which means that someone needs to go through the URL params to figure out which ones are necessary. The earliest I'll be able to get to that is next week most likely, but @JBGruber if you have time this week, please LMK which parameters are required and I'll fix it ASAP.
I also don't have time right now but will let you know if I find out more. Two new insights:
curl 'https://www.tiktok.com/api/search/item/full/?aid=1988&app_language=en&app_name=tiktok_web&battery_info=1&browser_language=en-GB&browser_name=Mozilla&browser_online=true&browser_platform=Linux%20x86_64&browser_version=5.0%20%28X11%3B%20Linux%20x86_64%29%20AppleWebKit%2F537.36%20%28KHTML%2C%20like%20Gecko%29%20Chrome%2F107.0.0.0%20Safari%2F537.36&channel=tiktok_web&cookie_enabled=true&count=20&device_id=7156603669521303045&device_platform=web_pc&focus_state=true&from_page=search&history_len=2&is_fullscreen=false&is_page_visible=true&keyword=%23rstats&offset=12&os=linux&priority_region=DE&referer=®ion=NL&screen_height=1200&screen_width=1920&search_id=20221115133736010190209216170B7D15&tz_name=Europe%2FAmsterdam&verifyFp=verify_l9h6422m_m9VsjncG_5Ki6_49SS_BPx5_IeVGoLXl4P9h&webcast_language=en' \
-H 'authority: www.tiktok.com' \
-H 'accept: */*' \
-H 'accept-language: en-GB,en;q=0.9,de-DE;q=0.8,de;q=0.7,en-US;q=0.6' \
-H 'cookie: ***REDACTED***' \
-H 'referer: https://www.tiktok.com/search/video?q=%23rstats&t=1668512698958' \
-H 'sec-ch-ua: "Chromium";v="107", "Not=A?Brand";v="24"' \
-H 'sec-ch-ua-mobile: ?0' \
-H 'sec-ch-ua-platform: "Linux"' \
-H 'sec-fetch-dest: empty' \
-H 'sec-fetch-mode: cors' \
-H 'sec-fetch-site: same-origin' \
-H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36' \
--compressed
challengeID
and cursor
are both gone. Pagination seems to work through the offset=12
bit. referer
is just the search url + unix epoch time stamp. This might actually make things easier.
Yeah, /search/item/full
works for me and it's probably better generally as it can be used to retrieve stitched videos as well. I'll make the change when time permits...
@JBGruber and @dfreelon Same thing happened in this issue from another project: https://github.com/davidteather/TikTok-Api/issues/976#issuecomment-1316747795
Removing any of the parameters from https://www.tiktok.com/api/challenge/item_list/
no longer returns anything
https://www.tiktok.com/api/challenge/item_list/
still has the cursor but not the offset, I tried to change it and it still doesn't work, it seems that __signature
has to change together or msToken
and X-Bogus
Do you have any idea how to resolve this?
@Jmallone Yes, use the endpoint @JBGruber identifies above in his curl
code. You'll need to manually go through the parameters to figure out the required ones, but it should work better than /challenge/item_list/
@dfreelon Awsome
I was using /challenge/item_list/
to get videos from hashtags, does /search/item/full/
do the same thing?
Because I used to take the challengeID of the Hashtag and in this /search/item/full
i use keyword
instead challengeID
I think so--we use a two-step process to pull video URLs from search
using one function and get the videos themselves applying a second function to those URLs. You could at least follow a similar approach.
I've got a working wrapper for it now over at the R package. I was able to cut down the api call quite a bit. You can search for users, hashtags or just keywords without specifying anything. This seems to be an improvement for data access:
Essentially, A url would read like this for a user: https://www.tiktok.com/api/search/item/full/?keyword=%40chilipeppers&offset=0
For a hashtag: https://www.tiktok.com/api/search/item/full/?keyword=%2523rstats&offset=0
The only header I send are the cookies.
The only downside now is that the user who provides the cookies needs to be logged in.
@JBGruber I have a question, the offset to the next page is the value of the cursor ?
https://www.tiktok.com/api/search/item/full/?keyword=%2523perobal&offset=36
Yes! Using the cursor is a way better idea than what I came up with :facepalm:. I simply counted the videos that were already returned and used that number as offset.
More Insights:
A minimal Code in Python
import requests
cookies = {
'ttwid': ' ***REMOVED***',
'sessionid': '*** REMOVED***'',
}
params = {
'keyword': '%23rstats',
'offset': '0',
}
response = requests.get('https://www.tiktok.com/api/search/item/full/', params=params, cookies=cookies)
response.text
if we remove ttwid
in cookies this message happens:
{
"status_code":2483,
"status_msg":"Please login your account first",
"log_pb":{
"impr_id":"2022111718004042A4CCC7DE79FF1E30D0"
}
}
I think you will have to use it with Delays, because here it was
{
"status_code":2484,
"status_msg":"Too many attempts. Try again in 1 hour.",
}
I took the save_hashtag_video_urls
function from pyktok.py
as draft.
The params must contain any device_id
, the keyword
and offset
. (The device_id is just a number with any 19 digits)
The cookies
only have to contain the ttwid
-Cookie.
The itemList
has been renamed to item_list
, the same applies to hasMore
which has been renamed to has_more
.
This works for me:
import random
import requests
import sys
import time
def get_videos_by_keyword(keyword, limit=1000):
cursor = 0
while cursor < limit:
params = {
'device_id': '1234567890123456789',
'keyword': keyword,
'offset': cursor,
}
try:
cookies = {
'ttwid': '1%7CPm9bTMLMzjZ48RTfSWSxsyFOpGIaDfICGUjuSUtm4ng%7C1668717040%7Cdc9c307a7f02eeae1fed06237bd2d7635c52cf583dfcde8963d1580efc90cb35'}
response = requests.get(
'https://www.tiktok.com/api/search/item/full/',
params=params,
cookies=cookies)
data = response.json()
videos = data['item_list']
counter = 0
for video in videos:
counter = counter + 1
#desc = video['desc']
#created = video['createTime']
#author = video['author']
#views = video['stats']['playCount']
url = 'https://tiktok.com/@' + video['author']['uniqueId'] + '/video/' + video['id']
print(url)
if counter >= limit:
break
cursor = cursor + len(videos)
if data["has_more"] != 1:
break
time.sleep(random.randint(1, 3))
except Exception as e:
print('Stopped at cursor="'+cursor+'"')
print('Done.')
def main():
args = sys.argv[1:]
if len(args) == 2 and args[0] == '-keyword':
keyword = args[1]
get_videos_by_keyword(keyword)
if len(args) == 4 and args[2] == '-limit':
keyword = args[1]
limit = int(args[3])
get_videos_by_keyword(keyword, limit)
else:
print('You have to enter some keyword, for example: -keyword "#fun #cats"')
if __name__ == "__main__":
args = sys.argv[1:]
main()
@TimoBaeuerle Thanks for this first draft of a new function! Do you want to add it as a pull req, or are you OK with me copy-pasting the code in and crediting you in the README? If you do the former you will be listed as an official contributor, in case you care about that sort of thing.
@dfreelon sure i can add this function into the projects repo. Should i just add the function or also update the existing save_hashtag_video_urls
-Function to the new api-url and params?
Hi @TimoBaeuerle
Did you resolve the "Too many attempts"
with delays ?
Hi @Jmallone,
since i used the device_id
-Parameter i never got this error message again. Currently i'm not sure if the delay at the end or device_id is responsible for this. Maybe i'll find out today.
"Too many attempts" was also returned for me when I sent a malformed cookie string by accident. I haven't seen it since even without pauses between requests (but I only requested a couple 1000 videos so far for testing).
@TimoBaeuerle let me know what you find afterward :)
@JBGruber Interesting observation, in my tests i sent a complete Cookie parameters and after a few requests it simply "burned" the cookies and stopped working and this "Too many attempts"
started to appear.
A trivia: Tiktok Announced research API update yesterday. I think that must be why the recent api changes.
@TimoBaeuerle If it's OK with you, I'd like to make extensive revisions to your code before I merge it back in--I realized it's possible to pull not only URLs but also other metadata with each call to search/item/full
, but it will require me to rethink other pieces of pyktok
first. So it will likely take a few days, but I'll credit you in the README when I push the changes, unless you object for whatever reason.
For inspiration, these are the fields I pull. vpluck just means to return NA if the field doesn't exist in the json and to check the returned type (e.g., integer):
tibble::tibble(
video_id = vpluck(json[[entries]], "video", "id"),
video_timestamp = video_timestamp,
video_url = vpluck(json[[entries]], "video", "downloadAddr"),
video_length = vpluck(json[[entries]], "video", "duration", val = "integer"),
video_title = vpluck(json[[entries]], "desc"),
video_diggcount = vpluck(json[[entries]], "stats", "diggCount", val = "integer"),
video_sharecount = vpluck(json[[entries]], "stats", "shareCount", val = "integer"),
video_commentcount = vpluck(json[[entries]], "stats", "commentCount", val = "integer"),
video_playcount = vpluck(json[[entries]], "stats", "playCount", val = "integer"),
video_description = vpluck(json[[entries]], "desc"),
video_is_ad = vpluck(json[[entries]], "isAd", val = "logical"),
author_name = author_name,
author_followercount = vpluck(json[[entries]], "authorStats", "followerCount", val = "integer"),
author_followingcount = vpluck(json[[entries]], "authorStats", "followingCount", val = "integer"),
author_heartcount = vpluck(json[[entries]], "authorStats", "heartCount", val = "integer"),
author_videocount = vpluck(json[[entries]], "authorStats", "videoCount", val = "integer"),
author_diggcount = vpluck(json[[entries]], "authorStats", "diggCount", val = "integer")
)
@JBGruber Thanks--my idea is to build out a separate function that pulls all the metadata fields that can be used either for a single video or for the results of a search/item/full
request. That will minimize the number of requests to the TikTok server and speed up runtime... trouble is finding time to actually write out the code...
Hey.. Thanks for great research and work..
I'm currently researching the Tiktok API, and I found out recently, the API I'm using (https://m.tiktok.com/api/challenge/item_list/)
doesn't work anymore.
I want to ask, is it possible to specify the return data per page with the search/item/full
API? I've tried using the cursor
but it doesn't work.
@azickri We're working on it, see upthread...
@dfreelon, thats great.. If I found a solution, can I share it here?
@azickri Sure thing, although I have a pretty good idea of how I want to do it, so I may borrow bits of your code rather than integrating it intact, if that's OK
@TimoBaeuerle If it's OK with you, I'd like to make extensive revisions to your code before I merge it back in--I realized it's possible to pull not only URLs but also other metadata with each call to
search/item/full
, but it will require me to rethink other pieces ofpyktok
first. So it will likely take a few days, but I'll credit you in the README when I push the changes, unless you object for whatever reason.
Thats ok for me, thanks ;)
Hi All, thanks for finding a solution so fast! If I am not mistaken the latest changes in the TikTok API broke also the save_video_comments() function as well. Today I was trying to find a solution but it looks like that now we have to supply additional parameters like X-Bogus which are URL specific. If we change any of the other parameters (e.g. cursor) X-Bogus has to be changed as well somehow else we start to get empty responses.
Another error message that happened from trying to catch too much
{
"status_msg":"You have reached the maximum number of searched today.",
"log_pb":{"impr_id":"2022111811254781019205416127D860D2"},
"status_code":2484
}
even set deviceId
and delays
, @TimoBaeuerle did this ever happen to you in you code?
@stefanuq Well in my defense I did try to warn y'all... I'll look into it and try to have some workable solutions by early next week. In the meantime, anyone is free to post working code here at any time, and I'll credit you if I end up using it.
@Jmallone you need 2~ sec delay between offsets and 10 to 20 between search and you can run it 24/7
OK, I looked into this over the weekend and have a few observations:
comments
, it's been blank for every video I've checked./api/comment/list/
, resend the request with all necessary URL params, and capture the results.I hesitate to implement anything using Selenium for several reasons:
But if anyone figures out anything faster for comments between now and then, LMK and I'll consider implementing it.
@dfreelon Nice observations.
The problem with waiting for tiktok to do something about it is not knowing what date it will release this official api.
@Jmallone I'm willing to wait and see in the short term (also because I have other commitments...) but I will at least fix the search function, likely over the next few days
@Jmallone I have never seen this error message. The bot I'm building is still under development, so maybe i will see more errors in production.
OK, just pushed out new functions that can pull from search/item/full
and get comments initially visible on a video page. Try it out and LMK if you encounter problems... also the new version is not available on PyPI yet, will do that later tonight.
OK, the latest version is now up on PyPI, so I'm going to close this until someone finds something wrong with it...
Has anyone observed that this search isn't exactly same as the hashtag search ?
eg: tiktok.com/tag/
but, this end point is giving me - https://us.tiktok.com/api/search/item/full/?keyword=#votigo&offset=0
For "votigo" keyword ( something less popular ), I see results for vertigo veriligo vitiligo virgo voti
This is doing an actual search for anything that remotely matches the keyword.
Any ideas on how to do an absolute match ?
@mandys Yes, this is also what happens with the search when you use a browser. I find it annoying as well--the only thing I can think of is to search first and filter your results. (Also hashtag search is now prohibitively difficult to use programmatically; see upthread.)
Anybody have same issue? I try hit API (tiktok.com/api/search/full) with full request Cookie, but response is
{
status_msg: 'Please login your account first',
log_pb: { impr_id: '202211240158570102450591401C07BBFC' },
status_code: 2483
}
You may need to be logged in to TikTok for some functions to work. Let me put that in the docs... I would also try setting the browser
param to a browser on your system from which you have visited TT while logged in.
@mandys I should point out that it's fairly easy to pull the first 15 videos displayed on a hashtag search. I will probably write a function to do this at some point, but right now you can do something like:
import pyktok as pyk
ui = pyk.get_tiktok_json('https://www.tiktok.com/tag/uidesign')
#then parse through the data in ui['ItemModule']
Not nearly as good as before, but better than nothing.
Thats great, I will use this method if the API doesn't work.. Thanks for advice..
@mandys I should point out that it's fairly easy to pull the first 15 videos displayed on a hashtag search. I will probably write a function to do this at some point, but right now you can do something like:
import pyktok as pyk ui = pyk.get_tiktok_json('https://www.tiktok.com/tag/uidesign') #then parse through the data in ui['ItemModule']
Not nearly as good as before, but better than nothing.
OK I did this, see new function save_tiktok_multi_page
which works with hashtag, user, and music pages. Only 30 videos per user and 15 per hashtag and song, but sometimes it be like that...
I feel like I'm missing something extremely obvious - pyk.save_tiktok_by_keyword (with save videos enabled) was working fine all day, now suddenly errors out immediately due to no item_list?
KeyError: 'item_list'
Stopped at cursor= 0
Other functions still seem to be working fine so I don't think I've been banned by tiktok
@idksomethinggeneric If you've been running it all day, my guess is TT might be throttling your usage. You can troubleshoot this by plugging any video URL into get_tiktok_json
and inspecting the resulting JSON object. It should be quite large, but if it's small and/or conveys a message like this:
{
"status_code":2484,
"status_msg":"Too many attempts. Try again in 1 hour.",
}
...you should probably give it some time.
Any body have same issue now? When i request with full cookie after tiktok login, response from API https://www.tiktok.com/api/search/item/full/ got Blank response with status 200
Have you gone back to the tiktok website? You should be logged in and you sometimes get a captcha you need to solve. Only then your cookies are valid to make requests. The api annoyingly almost always returns 200.
@JBGruber, of course.. i was check to tiktok.com, and copy new cookie but same result. Are you have same issue or work normally?
Thanks for all your hard work in this module! I've been using it to scrape TikTok videos by hashtag for a research study, but today when I try to run save_hashtag_video_urls() with any hashtag, I keep getting the following output:
and it keeps repeating. I'm still able to use the other functions, and I first noticed this issue around 12 pm CDT today. Would this be an issue with the TikTok API changing?