dfreelon / pyktok

A simple module to collect video, text, and metadata from Tiktok.
BSD 3-Clause "New" or "Revised" License
316 stars 44 forks source link

`TimeoutException: Message: #13

Closed leoleepsyche closed 1 year ago

leoleepsyche commented 1 year ago

Hi, there, sorry for bothering you again. when I tried to run the code below. it reports an error which is associated with TimeoutException: pyk.save_visible_comments( 'https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1', browser='firefox') I parsed the track below TimeoutException: Message: Stacktrace: RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8 WebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:182:5 NoSuchElementError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:394:5 element.find/</<@chrome://remote/content/marionette/element.sys.mjs:280:16

Hoping you could point out a direction for solving this problem.

dfreelon commented 1 year ago

Hi, this line of code works fine on my end. I'm not sure why you're receiving Chrome errors when you specified browser='firefox'. Also, I'm not familiar with the format of the error you pasted--it isn't a standard Python traceback, so it would be helpful if you could provide one.

One idea is if it's a timeout issue, maybe it could have something to do with your internet connectivity? I would try again from another computer or ISP. You can also try changing the browser parameter to either chrome or chromium (you don't need to have these browsers installed as the function should auto-install anything you don't already have--note this is different functionality from other pyktok functions with the browser_name parameter).

leoleepsyche commented 1 year ago

hi, there Thanks for your kind reply. Following your advice, I tried changing the browser parameter to chrome or chromium. the error is shown below. Screenshot 2022-12-12 at 18 01 14 If I changed the browser parameter to firefox, Screenshot 2022-12-12 at 18 00 22 Wired things are that I run well other functions such as you listed. it seems not due to the connectivity problems. It is really wired.

dfreelon commented 1 year ago

OK, now if you look at the Firefox traceback you'll see that the issue is to do with lines 311 and 312. I can't test this because the function works fine for me, but I suggest changing the number 10 on line 311 to a higher number (20, 30, 40, etc.) and see if that makes a difference. If so, I will add that as a parameter to a future version. Good luck!

leoleepsyche commented 1 year ago

Hi, @dfreelon Thanks for your help! Screenshot 2022-12-12 at 18 47 43 I changed a higher number(20,40,50,100,1000, etc.) as your advised. It seems that it is not due to this reason. if you have any other suggestions, please let me know.

Best Wishes Geng

dfreelon commented 1 year ago

OK, so my next guess is that Selenium may not be rendering your page properly. Try the following code--if the result is True, selenium is rendering the page correctly; if False, it is not:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.webdriver.chrome.service import Service as ChromeiumService #sic
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.firefox.service import Service as FirefoxService
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.utils import ChromeType
from webdriver_manager.firefox import GeckoDriverManager
import time

f_options = FirefoxOptions()
f_options.add_argument("--headless")
driver = webdriver.Firefox(service=FirefoxService(
                                           GeckoDriverManager().install()),
                                   options=f_options)
driver.get('https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1')
print('Sleeping for 30 secs, please wait...')
time.sleep(30)
print('tiktok-ku14zo-SpanUserNameText' in driver.page_source)
leoleepsyche commented 1 year ago

Screenshot 2022-12-12 at 20 03 43 Thanks for your reply. the result is "false". it seems that Selenium may not be rendering my page properly

dfreelon commented 1 year ago

Yeah, the next move is to save the contents of driver.page_source to a file, open it locally in a browser and see if you can figure out what's going on. Could be a login screen or something, but you should NOT see any comments.

leoleepsyche commented 1 year ago

sorry for replying to you so late. yes. you are right!!! it is a login screen. I could not see any comments. Screenshot 2022-12-13 at 11 56 15

dfreelon commented 1 year ago

Sounds like you're having cookie issues again. Being logged in to TikTok through the same browser specified in the browser parameter works for me (and most other users I assume, since no one else has raised this issue). Another thing you could try is to feed Selenium the cookies it needs, then get the page. But this is quite slow, as it requires Selenium to get the page twice:

import browser_cookie3

from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.webdriver.chrome.service import Service as ChromeiumService #sic
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.firefox.service import Service as FirefoxService
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.utils import ChromeType
from webdriver_manager.firefox import GeckoDriverManager
import time

f_options = FirefoxOptions()
f_options.add_argument("--headless")
driver = webdriver.Firefox(service=FirefoxService(
                                           GeckoDriverManager().install()),
                                   options=f_options)

driver.get('https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1')

tt_cookies = browser_cookie3.firefox(domain_name='tiktok.com')

for cookie in tt_cookies:  # session cookies
    cookie_dict = {'domain': 'tiktok.com', 'name': cookie.name, 'value': cookie.value, 'secure': bool(cookie.secure)}
    if cookie.expires:
        cookie_dict['expiry'] = cookie.expires
    if cookie.path_specified:
        cookie_dict['path'] = cookie.path

    driver.add_cookie(cookie_dict)

driver.get('https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1')
print('Sleeping for 30 secs, please wait...')
time.sleep(30)
print('tiktok-ku14zo-SpanUserNameText' in driver.page_source)
leoleepsyche commented 1 year ago

Screenshot 2022-12-13 at 17 24 54 Hi,there. Thank you again. But,the result still false... sorry about that.

dfreelon commented 1 year ago

Well I'm all out of ideas... all I can tell you is, the system is not recognizing you as being logged in, we know that from the login page you got. Since I can't test it on your system, I have no idea why that's not happening if you are logged in. Only other thing I can think of is try on a different computer/ISP. Good luck, I'm going to close this until someone else can reproduce the same issue.

leoleepsyche commented 1 year ago

A big thanks to you for your previous help!

Best Wishes Geng

christinapwalker commented 1 year ago

Hi, I'm having the same issue now....tried the solutions above (and have run it on two different ISPs) without luck....unsure if any other solutions have been found?

dfreelon commented 1 year ago

Well that's not good... unfortunately there's not much I can do to troubleshoot this directly since I can't reproduce the issue. If you really wanted to, you could experiment with Selenium to try and make it deliver the proper page (and not a login screen). Here's some code to get you started (assumes you have Chrome; can be modified for Firefox or Chromium). Running this should produce a Chrome window where you can see the video and the comments below it. The system didn't require a login for me, so if you see a login screen I'm not sure why. Good luck!

from selenium import webdriver
from selenium.webdriver.chrome.options import Options as ChromeOptions
from selenium.webdriver.chrome.service import Service as ChromeiumService
from webdriver_manager.chrome import ChromeDriverManager

c_options = ChromeOptions()
#f_options.add_argument("--headless")
driver = webdriver.Chrome(service=ChromeiumService(
                                  ChromeDriverManager().install()),
                          options=c_options)

driver.get('https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1')
dfreelon commented 1 year ago

The result of the code from the previous comment should look something like this (I scrolled down to show the first comment):

Screen Shot 2022-12-16 at 11 45 38 AM
christinapwalker commented 1 year ago

Super strange...when I run that code, I get the same page as in your screenshot...but the pyk.save_visible_comments('url') keeps returning the same error as the above comments

dfreelon commented 1 year ago

OK, I think I figured something out. In the pyktok source code, replace lines 315-316 with the following:

wait.until(EC.presence_of_element_located((By.XPATH, "//*[contains(@class,'SpanUserNameText')]")))

Then start a new console and try it. Note you'll need to run pyktok.py locally and run the functions without the module prefix, importing the old version won't work (e.g. save_visible_comments instead of pyk.save_visible_comments)

christinapwalker commented 1 year ago

Okay, so I kept tinkering with it and ended up realizing that why you couldn't replicate it was because the video had no comments. Since I was iterating through a dataset, I didn't even realize or think about it.

But in case others want to pull comments from a list/column of URLs, here is what I added to the source code (it also concatenates the data together, includes the video URL, has a progress bar, etc.). It is based on Chrome but could easily be changed to include FireFox:

Packages (at this point...I'm not even sure which ones I added/changed and which were already there, so I'm listing all of them):

import browser_cookie3
from bs4 import BeautifulSoup
from datetime import datetime
import json
import numpy as np
import os
import pandas as pd
import random
import re
import requests
import time
import progressbar

from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium import webdriver
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.core.utils import ChromeType

And the function:

def save_multiple_comments(video_urls,
    csv_name,
    comment_fn=None,
    browser='chrome'):
    dataframe = pd.DataFrame()
    bar = progressbar.ProgressBar()

    for i in bar(range(len(video_urls))):
        for url in video_urls:
            try:
                start_time = time.time()
                c_options = Options()
                c_options.add_argument("--headless")
                driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()), options=c_options)
                driver.get(url)
                wait = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH, "//*[contains(@class,'SpanUserNameText')]")))

            except TimeoutException:
                print(f"{url} has no comments.")
                continue

            else:
                soup = BeautifulSoup(driver.page_source, "html.parser")
                ids_tags = soup.find_all('div',{'class':re.compile('DivCommentContentContainer')})
                comment_ids = [i.get('id') for i in ids_tags]
                names_tags = soup.find_all('a',attrs={'class':re.compile("StyledUserLinkName")})
                styled_names = [i.text.strip() for i in names_tags]
                screen_names = [i.get('href').replace('/','') for i in names_tags]
                comments_tags = soup.find_all('p',attrs={'class':re.compile("PCommentText")})
                comments = [i.text.strip() for i in comments_tags]
                likes_tags = soup.find_all('span',attrs={'class':re.compile('SpanCount')})
                likes = [int(i.text.strip()) 
                if i.text.strip().isnumeric() 
                else i.text.strip() 
                for i in likes_tags]
                timestamp = datetime.now().isoformat()
                temp = pd.DataFrame(
                    {'comment_id': comment_ids,
                    'styled_name': styled_names,
                    'screen_name': screen_names,
                    'comment': comments,
                    'like_count': likes,
                    'time_collected': [timestamp] * len(likes),
                    'url': url
                    }
                )
                dataframe = pd.concat([dataframe, temp], ignore_index=True)
                dataframe = dataframe.drop_duplicates(subset='comment_id')

    dataframe.to_csv(f"{csv_name}.csv")
    print(f"File saved as {csv_name}.csv")

Thanks so much for all your help.

dfreelon commented 1 year ago

@christinapwalker Thanks for the code. It's pretty simple to pull comments from multiple videos using the existing save_visible_comments function, e.g.:

tiktok_videos = ['https://www.tiktok.com/@tiktok/video/7106594312292453675?is_copy_url=1&is_from_webapp=v1',
                 'https://www.tiktok.com/@tiktok/video/7011536772089924869?is_copy_url=1&is_from_webapp=v1']
for v in tiktok_videos:
    pyk.save_visible_comments(v,'tiktok_comments.csv')

But the error handling code for videos with no comments is important so I will incorporate that into the next version and credit you as I have other contributors.

dfreelon commented 1 year ago

@christinapwalker One more thing: I found that code like this:

dataframe = dataframe.drop_duplicates(subset='comment_id')

was insufficient to deduplicate my dataframes, as the resulting data still contained duplicate comments. Apparently Pandas sometimes has difficulty deduplicating columns full of long integers. I ended up having to convert the comment_id field to string format first for it to work. YMMV but it's a subtle problem and worth checking out if you're planning on using this for research.