dfreelon / pyktok

A simple module to collect video, text, and metadata from Tiktok.
BSD 3-Clause "New" or "Revised" License
346 stars 45 forks source link

Request error with slides download #38

Open Lexsah opened 11 months ago

Lexsah commented 11 months ago

Hello, I updated to the last version (.19) to patch #35. Video works fine now but slides don't.

I have the following error: Invalid URL '': No schema supplied. Perhaps you meant http://? Am I the only one having this issue ?

Thanks in advance.

dfreelon commented 11 months ago

@BillyBSig implemented this part of the code so I'll need to take some time to test it. But that "No schema supplied" error may occur if you omit the "https://" part of the TT URL (e.g. if you use "tiktok.com/@whateveretc..." instead of "https://tiktok.com/@whateveretc..."

Lexsah commented 11 months ago

I'm using a similar parsing on both video and slides, I want to note that I had no issues with this previously. Here is a snippet of my code with the linked output for both a video and a slide :

try:
    print(f"Trying to save {i}")
    pyk.save_tiktok(i,True)
    print("Save successful")
except Exception as e:
    print(f"Tiktok save error: {e}")

Output:

Trying to save https://vm.tiktok.com/ZGJoHUT3e/
Tiktok save error: Invalid URL '': No scheme supplied. Perhaps you meant https://?
Trying to save https://vm.tiktok.com/ZGed6wJx2/
Saved video
 https://v16-webapp-prime.tiktok.com/video/tos/maliva/tos-maliva-ve-0068c799-us/oUCIAEmgjgvQKA6ff1gvqeC9sW [...]

The issue isn't from the https as it is given in the command input

dfreelon commented 11 months ago

OK figured out the problem--the address used for video downloading is blank for slideshows. It'll take me some time to fix it, but if you want to access the data yourself in the meantime, you can find it in the output of alt_get_tiktok_json at tt_json["__DEFAULT_SCOPE__"]['webapp.video-detail']['itemInfo']['itemStruct']['imagePost']

tomasruizt commented 3 months ago

@dfreelon The error @Lexsah pointed out can also happen for videos.

Specifically, when the video url contained in the key downloadAddr is the empty string ''. An example where this happens for me is: https://www.tiktok.com/@lalobita0802/video/7386887684125461790

I noticed that instead of using the key downloadAddr I can fall-back to the key playAddr which also points to the video. As an added benefit, the video in playAddr has no TikTok watermark (compare snapshots below). Perhaps it would be better to switch to using the key playAddras a default? Or is there some downside to it?

Watermark Comparison for post=https://www.tiktok.com/@kontennonkreator/video/7381864022871706885

downloadAddr playAddr
image image
dfreelon commented 3 months ago

@tomasruizt Hi, I tested your suggestion and just implemented it into the code. Thanks.

xxyy-leo commented 3 months ago

It is not working for slideshow (Ex. https://www.tiktok.com/t/ZTNbTmcdc/). TikTok seems removed the ['webapp.video-detail'] for slideshow post so we can't get the image download url.

tomasruizt commented 1 week ago

I've been downloading slideshows and their corresponding music reliably for months with code building on top of pyktok. It looks like the snippet below. I just realized that the key is changing a substring in the URL from photo to video. The new video URL can still be used in the browser, since TikTok will redirect you to the photo URL.

If you want I can open a PR so that pyktok can download slideshows as well. @dfreelon

import pyktok as pyk
import requests

# original "https://www.tiktok.com/@trendysxzl/photo/7398323154424171806"
url = "https://www.tiktok.com/@trendysxzl/video/7398323154424171806"

tt_json = pyk.alt_get_tiktok_json(video_url=url)
data_slot = tt_json["__DEFAULT_SCOPE__"]["webapp.video-detail"]["itemInfo"]["itemStruct"]
urls: list[str] = [img["imageURL"]["urlList"][0] for img in data_slot["imagePost"]["images"]]
imgs: list[bytes] = [requests.get(url).content for url in urls]
for idx, img in enumerate(imgs):
    with open(f"post/{idx}.jpg", "wb") as f:
        f.write(img)
    print(f"Saved {idx}.jpg")

audio_url = data_slot["music"]["playUrl"]
if audio_url == "":
    print("No audio found!")
else:
    audio: bytes = requests.get(audio_url).content
    with open("post/audio.mp3", "wb") as f:
        f.write(audio)
    print("Saved audio.mp3")

Note: In my own code I also defensively check the tiktok statusMsg of a post before attempting to access urls etc., because the post might be unreachable for many reasons, like status_self_see, status_deleted, cross_border_violation, etc.

status_msg: str = tt_json["__DEFAULT_SCOPE__"]["webapp.video-detail"]["statusMsg"]
if status_msg != "ok":
    # handle failure