dfreelon / pyktok

A simple module to collect video, text, and metadata from Tiktok.
BSD 3-Clause "New" or "Revised" License
346 stars 45 forks source link

pyktok seems to crash when multipage save is called #41

Closed SachitNayak closed 4 months ago

SachitNayak commented 10 months ago

my code:

import pyktok as pyk
pyk.specify_browser('firefox')

def download_trending_videos(tag_val="datascience"):
    pyk.save_tiktok_multi_page(f'https://www.tiktok.com/tag/{tag_val}?lang=en', save_video=True, save_metadata=False)

if __name__ == "__main__":
    download_trending_videos("datascience")

error message:

(venv) (base) sacnayak@sacnayak-mac youTik % python3 downloader.py
We strongly recommend you run 'specify_browser' first, which will allow you to run pyktok's functions without using the browser_name parameter every time. 'specify_browser' takes as its sole argument a string representing a browser installed on your system, e.g. "chrome," "firefox," "edge," etc.

Traceback (most recent call last):
  File "/Users/sacnayak/PycharmProjects/youTik/downloader.py", line 10, in <module>
    download_trending_videos("datascience")
  File "/Users/sacnayak/PycharmProjects/youTik/downloader.py", line 6, in download_trending_videos
    pyk.save_tiktok_multi_page(f'https://www.tiktok.com/tag/{tag_val}?lang=en', save_video=True, save_metadata=False)
  File "/Users/sacnayak/PycharmProjects/youTik/venv/lib/python3.11/site-packages/pyktok/pyktok.py", line 302, in save_tiktok_multi_page
    data_loc = tt_json['ItemModule']
               ~~~~~~~^^^^^^^^^^^^^^
TypeError: 'NoneType' object is not subscriptable
(venv) (base) sacnayak@sacnayak-mac youTik % 

Note that I have already clearly specified the browser as firefox yet the stupid warning seems to pop up.

dfreelon commented 10 months ago

Yes, save_tiktok_multi_page needs to be updated to work with the new JSON data structure TT introduced awhile back. It'll probably be at least a few weeks before I get to this, or someone could start a PR and I'll merge it in. Basically the function needs to be edited to use alt_get_tiktok_json instead of get_tiktok_json which is now obsolete.

dfreelon commented 10 months ago

As for the security warning, I can't test for that as I don't have easy access to a Mac.

SachitNayak commented 9 months ago

I went through the codebase and I have understood the exact replacement that needs to be made. I could make a PR with the changes.

However I noticed get_tiktok_json method is being used in many other functions. Do you want me to replace all occurrences or leave the others as is, because some unit tests or something else might break?

Please let me know, thanks

dfreelon commented 9 months ago

Hi, I believe the function alt_get_tiktok_json has completely supplanted the original get_tiktok_json. For a time, TT hadn't completely rolled out the changes to their data storage format, so I left both in. I think the latter is completely obsolete at this point so I have no problem removing it. However, I still don't think save_tiktok_multi_page will work because I checked and AFAICT TT has removed all data about the videos themselves from user and hashtag pages. But if you can figure out how to make it work, by all means make a PR and I'll merge it in if it works for me.

pulakmehta commented 7 months ago

Hi, i'm stuck here. May i know were there any updates?

BryceBlankinship commented 6 months ago

@dfreelon Tiktok makes an API call to https://www.tiktok.com/api/post/item_list when on a user's page. This includes video id's that can be collected and iterated over using the existing save_tiktok_multi_urls. I'll make a PR when I get a chance to resolve this.

dfreelon commented 6 months ago

Great, much appreciated.

dfreelon commented 4 months ago

Hi all, I rewrote save_tiktok_multi_page to run based on the TikTokApi package and it's worked quite well for me in testing. The only catch is, it requires a "headed" browser call to work, which means it works best on GUI systems, and may not work at all on some text-only systems. Also, each call forces Chromium to steal focus from whatever you may be doing on your system, which can be annoying if you're trying to run it in the background. Feel free to test it and LMK how it's working, but I'm gonna close this for now.

BryceBlankinship commented 4 months ago

Hey I'll check it out, been super busy lately so couldn't make a PR myself. Thanks for maintaining this.

dfreelon commented 4 months ago

Great! BTW I figured out the headless/focus-stealing issue so it works even better now!