Data scraping unable to complete in 30.0s

ahm750 commented 1 year ago

I'm trying to use this package to scrape user profiles and it returns the error below. Any reason why it's happening and how to fix it?

Traceback (most recent call last):
  File "/home/usr1/.local/lib/python3.8/site-packages/tiktokapipy/api.py", line 318, in _scrape_data
    page.wait_for_selector("#SIGI_STATE", state="attached")
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/sync_api/_generated.py", line 7991, in wait_for_selector
    self._sync(
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_sync_base.py", line 104, in _sync
    return task.result()
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_page.py", line 364, in wait_for_selector
    return await self._main_frame.wait_for_selector(**locals_to_params(locals()))
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_frame.py", line 322, in wait_for_selector
    await self._channel.send("waitForSelector", locals_to_params(locals()))
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 44, in send
    return await self._connection.wrap_api_call(
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call
    return await cb()
  File "/home/usr1/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 79, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for locator("#SIGI_STATE")
============================================================
Traceback (most recent call last):
  File "main.py", line 40, in <module>
    user = api.user("andrew.araya", "3")
  File "/home/usr1/.local/lib/python3.8/site-packages/tiktokapipy/api.py", line 269, in user
    response, api_extras = self._scrape_data(link, self._user_response_type)
  File "/home/usr1/.local/lib/python3.8/site-packages/tiktokapipy/api.py", line 333, in _scrape_data
    raise TikTokAPIError(
tiktokapipy.TikTokAPIError: Data scraping unable to complete in 30.0s (retries: 0)

Here's my code:

from tiktokapipy.api import TikTokAPI

with TikTokAPI() as api:
    user = api.user("therock", "3")
    print(user)

Russell-Newton commented 1 year ago

This is probably happening because the page doesn't load correctly. The easiest way to fix this is to increase the navigation_retries parameter in the TikTokAPI constructor.

Specifically, TikTok inserts into every page a script element that contains all the preloaded data that's used to set the initial state of any page. TikToPy grabs this tag, which is what the wait_for_selector is for. It just makes sure that the element is there.

Reasons the element may not be present:

the user doesn't exist
the scraper encountered a captcha
something else I haven't seen

In my experience, navigation retries bypass these issues after one or two.

As a side note, you'll probably need to make sure the second parameter in api.user is an int not a string.

Russell-Newton commented 1 year ago

When I use the library I normally set retries to 1 or 2.

Please let me know if this helps!

ahm750 commented 1 year ago

Thanks! I tried passing the navigation_retries parameter and changing the video count parameter to an int, but the same problem exists.

Is there any way to use a proxy with it? If so, how can I pass the proxy credentials? Couldn't find it anywhere in the docs.

Russell-Newton commented 1 year ago

Issue #11 asked about adding a proxy. I answered there as well as added an example in the docs for it. Check it out, and if it still isn't working, try running the API with headless=False. There may be some indicator on the page as to what's going on.

Russell-Newton commented 1 year ago

Update from #29 - proxy settings should actually work now

Russell-Newton / TikTokPy

Data scraping unable to complete in 30.0s #16