Russell-Newton / TikTokPy

Extract data from TikTok without needing any login information or API keys.
https://pypi.org/project/tiktokapipy/
MIT License
197 stars 25 forks source link

[BUG] Timeout with firefox and Emulate Mobile #33

Closed vleg2022 closed 1 year ago

vleg2022 commented 1 year ago

Describe the bug The only way that I could download comments it's when I use with TikTokAPI(navigator_type="firefox",emulate_mobile=True) as api: This code, it works when I run in my Macbook and VsCode But, when I put this code in EC2 or other server, I get a timeout message and only works if I use emulate_mobile=False, but the problem with this config is I don't take the comments. I checked all version os libs and everything looks the same I checked firewall and other security issues in EC2, there´s nothing else to do.

Anyone has this issue?

To Reproduce Steps to reproduce the behavior: Run this code in EC2 with TikTokAPI(navigator_type="firefox",navigation_retries=1,navigation_timeout=0,emulate_mobile=True) as api: video = api.video("https://vm.tiktok.com/ZMYrkvoqy") frase = video.comments[0].text.capitalize() print(frase)

See the error below... `Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.10/site-packages/tiktokapipy/api.py", line 355, in _scrape_data page.wait_for_selector("#SIGI_STATE", state="attached") File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/sync_api/_generated.py", line 8213, in wait_for_selector self._sync( File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_sync_base.py", line 104, in _sync return task.result() File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_page.py", line 364, in wait_for_selector return await self._main_frame.wait_for_selector(**locals_to_params(locals())) File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 322, in wait_for_selector await self._channel.send("waitForSelector", locals_to_params(locals())) File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 44, in send return await self._connection.wrap_api_call( File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call return await cb() File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 79, in inner_send result = next(iter(done)).result() playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded. =========================== logs =========================== waiting for locator("#SIGI_STATE")

Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.10/site-packages/tiktokapipy/api.py", line 355, in _scrape_data page.wait_for_selector("#SIGI_STATE", state="attached") File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/sync_api/_generated.py", line 8213, in wait_for_selector self._sync( File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_sync_base.py", line 104, in _sync return task.result() File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_page.py", line 364, in wait_for_selector return await self._main_frame.wait_for_selector(**locals_to_params(locals())) File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 322, in wait_for_selector await self._channel.send("waitForSelector", locals_to_params(locals())) File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 44, in send return await self._connection.wrap_api_call( File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call return await cb() File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 79, in inner_send result = next(iter(done)).result() playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded. =========================== logs =========================== waiting for locator("#SIGI_STATE")

Traceback (most recent call last): File "/home/ubuntu/bla.py", line 4, in video = api.video("https://vm.tiktok.com/ZMYrkvoqy") File "/home/ubuntu/.local/lib/python3.10/site-packages/tiktokapipy/api.py", line 316, in video response, api_extras = self._scrape_data( File "/home/ubuntu/.local/lib/python3.10/site-packages/tiktokapipy/api.py", line 370, in _scrape_data raise TikTokAPIError( tiktokapipy.TikTokAPIError: Data scraping unable to complete in 0.0s (retries: 1) `

Expected behavior Expect return json with video data and comments

Version Information tiktokapipy==0.1.11 pydantic==1.10.4 playwright==1.30.0

Additional context Nothing else..

Russell-Newton commented 1 year ago

Couple of notes:

Please try changing the timeout. If the problem persists, let me know and I can take a look.

vleg2022 commented 1 year ago

Thank u @Russell-Newton for your reply! :-) Russel, I tried run this code below and the timeout error persists! It looks interesting because at my local computer, the code runs good, but when I use it in AWS EC2 I get the timeout error Just confirming here that I checked all the firewall config in EC2 and other requests by BeautifulSoup works (other sites)

from tiktokapipy.api import TikTokAPI

def do_something():
    with TikTokAPI(navigator_type="firefox",emulate_mobile=False,navigation_retries=3,navigation_timeout=0) as api:
        print("aaa")
        video = api.video("https://vm.tiktok.com/ZMY6tXCyt")
        print("bbb")

do_something()
Russell-Newton commented 1 year ago

Try setting navigation_timeout=30.

A big difference between libraries like Beautiful soup and requests and those like Playwright and TikTokPy is that the former don't load and run JavaScript. TikTokPy looks for an HTML element created by JavaScript, and this can take a varying amount of time to be created. If navigation_timeout=0, TikTokPy will give the JavaScript no time to create the element, and if the element isn't there immediately it throws a timeout exception.

You need to set the timeout parameter to something other than 0 so the JavaScript has time to run.

vleg2022 commented 1 year ago

Hi @Russell-Newton , I tried with navigation_timeout=30 but doesn't work yet :-(

from tiktokapipy.api import TikTokAPI
def do_something():
    with TikTokAPI(navigator_type="firefox",emulate_mobile=False,navigation_retries=3,navigation_timeout=30) as api:
        print("aaa")
        video = api.video("https://vm.tiktok.com/ZMY6tXCyt")
        print("bbb")

do_something()

Error

aaa
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tiktokapipy/api.py", line 354, in _scrape_data
    page.goto(link)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/sync_api/_generated.py", line 9183, in goto
    self._sync(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_sync_base.py", line 104, in _sync
    return task.result()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_page.py", line 491, in goto
    return await self._main_frame.goto(**locals_to_params(locals()))
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_frame.py", line 147, in goto
    await self._channel.send("goto", locals_to_params(locals()))
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 44, in send
    return await self._connection.wrap_api_call(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 419, in wrap_api_call
    return await cb()
  File "/home/ubuntu/.local/lib/python3.10/site-packages/playwright/_impl/_connection.py", line 79, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
navigating to "https://vm.tiktok.com/ZMY6tXCyt", waiting until "load"
============================================================
Russell-Newton commented 1 year ago

Someone else reached out about a similar issue. I suspect there's some anti-bot measures that get triggered when TikTokPy is executed from a Linux VM, container, or server. This could be related to user agent selection, but I'll need to look into this a bit more.

Russell-Newton commented 1 year ago

I don't have a whole lot of time to look at this for the next couple of weeks, but if you want to help, it would be interesting to try running the library with a headed browser instead of the default headless mode. this won't be possible unless your VM has an xserver installed, so no pressure

Russell-Newton commented 1 year ago

@vleg2022 Please try again with version 0.1.12. I've modified the navigation to no longer create the issue you saw in https://github.com/Russell-Newton/TikTokPy/issues/33#issuecomment-1442408255.

I also suggest updating all dependencies.

Russell-Newton commented 1 year ago

Comments should now be collectable in version 0.1.13. Firefox navigation has also been deprecated and disabled, as it seems to be incompatible with a lot of the intended features of TikTokPy.

I've also stopped TikTokPy from printing TimeoutErrors as a stacktrace, which may have been confounding your situation.

Furthermore, some of the issues related to navigation in Docker/Server distributions should be fixed. You might have much better success now, so I'm closing this issue. If for some reason you're still having trouble, feel free to open it back up or create a new issue.