dfreelon / pyktok

A simple module to collect video, text, and metadata from Tiktok.
BSD 3-Clause "New" or "Revised" License
316 stars 44 forks source link

TypeError: 'NoneType' object is not subscriptable #33

Closed BillyBSig closed 9 months ago

BillyBSig commented 10 months ago

Hi, do you have solution for this iissue?

import pyktok as pyk pyk.specify_browser("firefox") pyk.save_tiktok("https://www.tiktok.com/@vantoan___/video/7294298719665622305?is_from_webapp=1&sender_device=pc", True, 'video_data.csv', browser_name = "firefox")

NoneType

dfreelon commented 10 months ago

I ran your code and it worked fine for me. Maybe upgrade your Python to 3.10? That's what I'm using.

BillyBSig commented 10 months ago

I have tried with python 3.10 and 3.11 but the results are still the same, 'NoneType' object has no attribute 'keys'.

I dont get any result in get_tiktok_json function

keysNone

because return None in

soup.find('script', attrs={'id':"SIGI_STATE"})

JBGruber commented 10 months ago

Have you ever opened TikTok on Firefox? If not, maybe you don't have the necessary cookies?

dfreelon commented 10 months ago

That looks like a cookie issue to me. I was going to suggest trying another browser on your system, again ensuring you've opened TikTok on it at least once.

NativesolDevOps commented 10 months ago

I'm also empiercing the same issue I'm running on python 3.10. tried it on all 3 browsers i.e. chrome, Firefox and Edge

dfreelon commented 10 months ago

Do you have a TT account and is the browser on or off when you run the program? Also please list your OS and version.

BillyBSig commented 10 months ago

Have you ever opened TikTok on Firefox? If not, maybe you don't have the necessary cookies?

yes, i have opened link Tiktok on firefox and chrome

BillyBSig commented 10 months ago

Do you have a TT account and is the browser on or off when you run the program? Also please list your OS and version.

yes i have, I tried it both with TT account or without sign in my TT account, and I open the browser when run the code. I'm using Ubuntu 20.04.6 LTS.

I have run the code separately, and this is the result

tt_script_none

dfreelon commented 10 months ago

OK, please try the following troubleshooting steps:

Also please try the code using pyk.specify_browser("chrome") rather than pyk.specify_browser("firefox") if you haven't already. Please report whether it returns the same error or a different one.

Finally I would try it out on an non-Ubuntu OS to see if that's the issue. I know Pyktok works on Mac and Windows but don't have access to Ubuntu so not sure what the issue would be there.

BillyBSig commented 10 months ago

I found the problem.

SIGI_STATE is not in my video page source., either in firefox and chrome, and i tried this in my OS windows also.

my video page's source uses __UNIVERSAL_DATA_FOR_REHYDRATION__, I don't know why it's different from yours. The structure and key names on the video page source had also changed, so I couldn't use the existing code.

alternatively, I need to make some change to the code.

##similar in pyktok code
def get_tiktok_json(video_url,browser_name=None):
    if 'cookies' not in globals() and browser_name is None:
        raise BrowserNotSpecifiedError
    global cookies
    if browser_name is not None:
        cookies = getattr(browser_cookie3,browser_name)(domain_name='www.tiktok.com')
    tt = requests.get(video_url,
                      headers=headers,
                      cookies=cookies,
                      timeout=20)
    # retain any new cookies that got set in this request
    cookies = tt.cookies
    soup = BeautifulSoup(tt.text, "html.parser")
    tt_script = soup.find('script', attrs={'id':"SIGI_STATE"})
    try:
        tt_json = json.loads(tt_script.string)
    except AttributeError:
        print("The function encountered a downstream error and did not deliver any data, which happens periodically for various reasons. Please try again later.")
        return
    return tt_json

##alternative  get_tiktok_json
def alt_get_tiktok_json(video_url,browser_name=None):
    if 'cookies' not in globals() and browser_name is None:
        raise BrowserNotSpecifiedError
    global cookies
    if browser_name is not None:
        cookies = getattr(browser_cookie3,browser_name)(domain_name='www.tiktok.com')
    tt = requests.get(video_url,
                      headers=headers,
                      cookies=cookies,
                      timeout=20)
    # retain any new cookies that got set in this request
    cookies = tt.cookies
    soup = BeautifulSoup(tt.text, "html.parser")
    tt_script = soup.find('script', attrs={'id':"__UNIVERSAL_DATA_FOR_REHYDRATION__"})
    try:
        tt_json = json.loads(tt_script.string)
    except AttributeError:
        print("The function encountered a downstream error and did not deliver any data, which happens periodically for various reasons. Please try again later.")
        return
    return tt_json

##save_tiktok adding condition
def save_tiktok(video_url,
                save_video=True,
                metadata_fn='',
                browser_name=None):
    if 'cookies' not in globals() and browser_name is None:
        raise BrowserNotSpecifiedError
    if save_video == False and metadata_fn == '':
        print('Since save_video and metadata_fn are both False/blank, the program did nothing.')
        return

    tt_json = get_tiktok_json(video_url,browser_name)

##check if tt_json not None by using get_tiktok_json
    if tt_json is not None:
        video_id = list(tt_json['ItemModule'].keys())[0]

        if save_video == True:
            regex_url = re.findall(url_regex, video_url)[0]
            if 'imagePost' in tt_json['ItemModule'][video_id]:
                slidecount = 1
                for slide in tt_json['ItemModule'][video_id]['imagePost']['images']:
                    video_fn = regex_url.replace('/', '_') + '_slide_' + str(slidecount) + '.jpeg'
                    tt_video_url = slide['imageURL']['urlList'][0]
                    headers['referer'] = 'https://www.tiktok.com/'
                    # include cookies with the video request
                    tt_video = requests.get(tt_video_url, allow_redirects=True, headers=headers, cookies=cookies)
                    with open(video_fn, 'wb') as fn:
                        fn.write(tt_video.content)
                    slidecount += 1
            else:
                regex_url = re.findall(url_regex, video_url)[0]
                video_fn = regex_url.replace('/', '_') + '.mp4'
                tt_video_url = tt_json['ItemModule'][video_id]['video']['downloadAddr']
                headers['referer'] = 'https://www.tiktok.com/'
                # include cookies with the video request
                tt_video = requests.get(tt_video_url, allow_redirects=True, headers=headers, cookies=cookies)
            with open(video_fn, 'wb') as fn:
                fn.write(tt_video.content)
            print("Saved video\n", tt_video_url, "\nto\n", os.getcwd())

        if metadata_fn != '':
            data_slot = tt_json['ItemModule'][video_id]
            data_row = generate_data_row(data_slot)
            try:
                user_id = list(tt_json['UserModule']['users'].keys())[0]
                data_row.loc[0,"author_verified"] = tt_json['UserModule']['users'][user_id]['verified']
            except Exception:
                pass
            if os.path.exists(metadata_fn):
                metadata = pd.read_csv(metadata_fn,keep_default_na=False)
                combined_data = pd.concat([metadata,data_row])
            else:
                combined_data = data_row
            combined_data.to_csv(metadata_fn,index=False)
            print("Saved metadata for video\n",video_url,"\nto\n",os.getcwd())

##This is using alt_get_tiktok_json
    else:
        tt_json = alt_get_tiktok_json(video_url,browser_name)
        regex_url = re.findall(url_regex, video_url)[0]
        video_fn = regex_url.replace('/', '_') + '.mp4'
        tt_video_url = tt_json["__DEFAULT_SCOPE__"]['webapp.video-detail']['itemInfo']['itemStruct']['video']['downloadAddr']
        headers['referer'] = 'https://www.tiktok.com/'
        # include cookies with the video request
        tt_video = requests.get(tt_video_url, allow_redirects=True, headers=headers, cookies=cookies)
        with open(video_fn, 'wb') as fn:
            fn.write(tt_video.content)

        if metadata_fn != '':
            data_slot = tt_json["__DEFAULT_SCOPE__"]['webapp.video-detail']['itemInfo']['itemStruct']
            data_row = generate_data_row(data_slot)
            try:
                user_id = list(tt_json['UserModule']['users'].keys())[0]
                data_row.loc[0,"author_verified"] = tt_json["__DEFAULT_SCOPE__"]['webapp.video-detail']['itemInfo']['itemStruct']['author']
            except Exception:
                pass
            if os.path.exists(metadata_fn):
                metadata = pd.read_csv(metadata_fn,keep_default_na=False)
                combined_data = pd.concat([metadata,data_row])
            else:
                combined_data = data_row
            combined_data.to_csv(metadata_fn,index=False)
            print("Saved metadata for video\n",video_url,"\nto\n",os.getcwd())
dfreelon commented 10 months ago

Glad you found a solution. If you have time, may I suggest adding a pull request so I can update Pyktok? If not, I'll do it when time permits. I will of course credit you on the main page. Anyone else seeing this can also feel free to copy the code and do a PR.

Can I also ask what country you live in? I'm guessing the difference in the script IDs might have something to do with that.

JBGruber commented 10 months ago

@BillyBSig, does this actually work with the data in __UNIVERSAL_DATA_FOR_REHYDRATION__:

video_id = list(tt_json['ItemModule'].keys())[0]
tt_json['ItemModule'][video_id]['video']['downloadAddr']`. 

I always get both IDs, but only the data in SIGI_STATE is actually useful (I observed this in several countries in Europe).

BillyBSig commented 10 months ago

Glad you found a solution. If you have time, may I suggest adding a pull request so I can update Pyktok? If not, I'll do it when time permits. I will of course credit you on the main page. Anyone else seeing this can also feel free to copy the code and do a PR.

Sure, I have added a new pull request, thanks for your time to review it

Can I also ask what country you live in? I'm guessing the difference in the script IDs might have something to do with that.

I'm living in Indonesia, yes I guess so, maybe some regions have different scripts

BillyBSig commented 10 months ago

@BillyBSig, does this actually work with the data in __UNIVERSAL_DATA_FOR_REHYDRATION__:

video_id = list(tt_json['ItemModule'].keys())[0]
tt_json['ItemModule'][video_id]['video']['downloadAddr']`. 

unfortunately no, data in __UNIVERSAL_DATA_FOR_REHYDRATION__ has different structure, and there is no ItemModule in it.

I always get both IDs, but only the data in SIGI_STATE is actually useful (I observed this in several countries in Europe).

Last month i tried with SIGI_STATE and everything worked fine, but suddenly the script changed to __UNIVERSAL_DATA_FOR_REHYDRATION__. Maybe there are differences script in some regions.

dfreelon commented 9 months ago

Merged in the PR implementing the fix for this.