dfreelon / pyktok

A simple module to collect video, text, and metadata from Tiktok.
BSD 3-Clause "New" or "Revised" License
338 stars 44 forks source link

Video file download doesn't seem to be working #12

Closed aurman21 closed 1 year ago

aurman21 commented 1 year ago

Video file download seems to be broken since a couple of days. The metadata still gets downloaded properly but instead of a proper video file I get a ~400 byte (kinda empty) file with pyktok. With traktok (which I originally used and all worked well until some days ago) I just get HTTP403 (same issue, metadata gets downloaded, but no video). I came here from traktok to see if the issue is isolated to it or not (seems like not).

Also tried this with 2 different computers, 4 different wifi networks across 2 countries, also changed the cookie files, updated all libraries I could, still consistently get HTTP403 in traktok (and a broken <1kb file with pyktok). Also tried not just the account I am interested in now, but also those I successfully downloaded videos from just last week, and the one in the example here, all have the same issue.

to try reproducing - basically the code from the video download example gives the issue. if for everyone else this still works though, I will try to locate what could be causing the issue on my side beyond all I've tried (as described above)

pyk.save_tiktok('https://www.tiktok.com/@tiktok/video/7106594312292453675?is_copy_url=1&is_from_webapp=v1', True, 'video_data.csv')

JBGruber commented 1 year ago

I just fixed it in traktok (https://github.com/JBGruber/traktok/issues/6). @dfreelon the download function now needs the header referer = "https://www.tiktok.com/" and some cookies (I did not test which ones).

aurman21 commented 1 year ago

oh that was quick, works like a charm in traktok now, just tested!

dfreelon commented 1 year ago

@aurman21 Try this: pyk.save_tiktok('https://www.tiktok.com/@tiktok/video/7106594312292453675?is_copy_url=1&is_from_webapp=v1',True,browser_name='chrome') (this requires that you have Chrome installed; you can also use browser_name='firefox')

aurman21 commented 1 year ago

@dfreelon just tried, still same issue with pyktok (but now traktok works again (see @JBGruber solution above, guess that was the issue for both pyktok and traktok)

dfreelon commented 1 year ago

Here's a revised version of save_tiktok with that header added which you can try. Use the same call I posted above to try it (i.e. browser_name needs to be either firefox or chrome, and this may not work on command-line only systems):

headers = {'Accept-Encoding': 'gzip, deflate, sdch',
           'Accept-Language': 'en-US,en;q=0.8',
           'Upgrade-Insecure-Requests': '1',
           'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
           'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
           'Cache-Control': 'max-age=0',
           'Connection': 'keep-alive'}

def save_tiktok(video_url,
                save_video=True,
                metadata_fn='',
                browser_name=None):
    if save_video == False and metadata_fn == '':
        print('Since save_video and metadata_fn are both False/blank, the program did nothing.')
        return

    tt_json = get_tiktok_json(video_url,browser_name)

    if save_video == True:
        regex_url = re.findall('(?<=@)(.+?)(?=\?|$)',video_url)[0]
        video_fn = regex_url.replace('/','_') + '.mp4'
        tt_video_url = tt_json['ItemList']['video']['preloadList'][0]['url']
        headers['referer'] = 'https://www.tiktok.com/'
        tt_video = requests.get(tt_video_url,allow_redirects=True,headers=headers)
        with open(video_fn, 'wb') as fn:
            fn.write(tt_video.content)
        print("Saved video\n",tt_video_url,"\nto\n",os.getcwd())

    if metadata_fn != '':
        data_slot = tt_json['ItemModule'][list(tt_json['ItemModule'].keys())[0]]
        data_row = generate_data_row(data_slot)
        try:
            data_row.loc[0,"author_verified"] = tt_json['UserModule']['users'][list(tt_json['UserModule']['users'].keys())[0]]['verified']
        except Exception:
            pass
        if os.path.exists(metadata_fn):
            metadata = pd.read_csv(metadata_fn,keep_default_na=False)
            combined_data = pd.concat([metadata,data_row])
        else:
            combined_data = data_row
        combined_data.to_csv(metadata_fn,index=False)
        print("Saved metadata for video\n",video_url,"\nto\n",os.getcwd())
aurman21 commented 1 year ago

thx!