davidteather / TikTok-Api

The Unofficial TikTok API Wrapper In Python
https://davidteather.github.io/TikTok-Api
MIT License
4.88k stars 976 forks source link

[BUG] - Memory overflow #282

Closed yswtrue closed 4 years ago

yswtrue commented 4 years ago

Describe the bug

When I deploy the service on the server. I run the getSuggestedUsersbyIDCrawler on celery worker. It works fine at first, but as time goes, the memory usage keeps growing. Now I have limited each worker use 1G memory.But the work get this error Process 'ForkPoolWorker-780' pid:12156 exited with 'signal 9 (SIGKILL)' how to limit the memory usage.

The buggy code

Please insert the code that is throwing errors or is giving you weird unexpected results.

logger = logging.getLogger(__name__)

api = TikTokApi()

results = 10

@shared_task(autoretry_for=(Exception,), retry_kwargs={'max_retries': 5}, default_retry_delay=30 * 60)
def tiktok_user_crawler(user_id=None, language='en', proxy=None, count=10):
    if not proxy:
        proxy = None
    if user_id is None:
        anchor = models.Anchor.objects.order_by('-created').first()
        if not anchor:
            return
        users = api.getSuggestedUsersbyIDCrawler(count=count, startingId=anchor.user_id, language=language, proxy=proxy)
    else:
        users = api.getSuggestedUsersbyIDCrawler(count=count, startingId=user_id, language=language, proxy=proxy)
    for user in users:
        username = user.get('subTitle', '').replace('@', '')
        is_exists = models.Anchor.objects.filter(username=username).exists()
        if is_exists and models.Anchor.objects.filter(username=username).first().updated >= now() - timedelta(hours=24):
            continue
        anchor = models.Anchor(
            user_id=user.get('extraInfo', {}).get('userId', ''),
            username=user.get('subTitle', '').replace('@', ''),
            nickname=user.get('title', ''),
            verified=user.get('extraInfo', {}).get('verified', False),
            language=language,
            # fans=int(user.get('extraInfo', {}).get('fans', '0')),
            # likes=int(user.get('extraInfo', {}).get('likes', '0')),
            # following_count=int(user.get('extraInfo', {}).get('fans', '0')),
        )
        tiktok_user = api.getUser(anchor.username, language=language, proxy=proxy)
        anchor.video_count = int(tiktok_user.get('userInfo', {}).get('stats', {}).get('videoCount', '0'))
        anchor.digg_count = int(tiktok_user.get('userInfo', {}).get('stats', {}).get('diggCount', '0'))
        anchor.heart_count = int(tiktok_user.get('userInfo', {}).get('stats', {}).get('heartCount', '0'))
        anchor.following_count = int(tiktok_user.get('userInfo', {}).get('stats', {}).get('followingCount', '0'))
        anchor.follower_count = int(tiktok_user.get('userInfo', {}).get('stats', {}).get('followerCount', '0'))
        anchor.avatar = tiktok_user.get('userInfo', {}).get('avatarLarger', '')
        anchor.save()
        if user.get('extraInfo', {}).get('userId', '') and user.get('extraInfo', {}).get('userId', '') != user_id:
            tiktok_user_crawler.apply_async(kwargs={
                'user_id': user.get('extraInfo', {}).get(
                    'userId', ''), 'language': language, 'proxy': proxy, 'count': count
            }, countdown=60)

Expected behavior

A clear and concise description of what you expected to happen.

Error Trace (if any)

Put the error trace below if there's any error thrown.

analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:56935/devtools/browser/6fa793a1-d578-402f-9ffa-fb88fa59b5f1
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:17,463: ERROR/MainProcess] Process 'ForkPoolWorker-9' pid:1063 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:17,475: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:17,977: ERROR/MainProcess] Process 'ForkPoolWorker-13' pid:4044 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:17,989: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,354: INFO/MainProcess] Received task: seatower_analytics.tiktok.tasks.tiktok_user_crawler[48091c74-890d-4ab6-a8f1-9e37bd55542e]  ETA:[2020-09-30 20:20:33.828565+08:00]
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,356: INFO/MainProcess] Received task: seatower_analytics.tiktok.tasks.tiktok_user_crawler[010f1fbe-74d7-49c7-8b13-0aae330d058d]  ETA:[2020-09-30 20:20:34.747167+08:00]
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,358: INFO/MainProcess] Received task: seatower_analytics.tiktok.tasks.tiktok_user_crawler[ab3ce521-90d6-47f4-aeae-731f40c8207f]  ETA:[2020-09-30 20:20:35.345027+08:00]
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,358: ERROR/MainProcess] Process 'ForkPoolWorker-15' pid:5798 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,370: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,840: INFO/MainProcess] Received task: seatower_analytics.tiktok.tasks.tiktok_user_crawler[84b81a59-b9ff-4d02-857f-83a2dfa22ee3]  ETA:[2020-09-30 20:20:36.116548+08:00]
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,846: ERROR/MainProcess] Process 'ForkPoolWorker-14' pid:4864 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:18,942: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:55094/devtools/browser/5a5a81f5-495b-46cf-9dd3-23ed373d6c93
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:19,951: ERROR/MainProcess] Process 'ForkPoolWorker-17' pid:7042 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:20,137: ERROR/MainProcess] Process 'ForkPoolWorker-16' pid:7040 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:20,239: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:20,243: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).')
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | Traceback (most recent call last):
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |   File "/usr/local/lib/python3.8/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    |     raise WorkerLostError(
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:45062/devtools/browser/8baf7ceb-83b1-4887-8854-8054eee0230c
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:38723/devtools/browser/702d9158-ec60-4c84-a30f-86e2886dbfea
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:59278/devtools/browser/f0f79828-365d-4f5f-a41d-434f683c5d4d
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:21,845: ERROR/MainProcess] Process 'ForkPoolWorker-24' pid:None exited with 'exitcode None'
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [2020-10-01 00:50:21,939: WARNING/MainProcess] Restoring 31 unacknowledged message(s)
analytics_celeryworker.2.kg2fx63snfad@VM_0_11_centos    | [I:pyppeteer.launcher] Browser listening on: ws://127.0.0.1:44114/devtools/browser/bfaac411-7e9e-4d02-bb37-d1a8c47ce736
analytics_celeryworker.3.kjxznwxx8afz@VM_0_11_centos    | [2020-10-01 03:50:06,226: WARNING/MainProcess] process with pid=1853 already exited
analytics_celeryworker.3.kjxznwxx8afz@VM_0_11_centos    | [2020-10-01 03:50:06,226: ERROR/MainProcess] Process 'ForkPoolWorker-81826' pid:5135 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.3.kjxznwxx8afz@VM_0_11_centos    | [2020-10-01 03:50:06,237: ERROR/MainProcess] Process 'ForkPoolWorker-81825' pid:5134 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.3.kjxznwxx8afz@VM_0_11_centos    | [2020-10-01 03:50:06,247: ERROR/MainProcess] Process 'ForkPoolWorker-81824' pid:5133 exited with 'signal 9 (SIGKILL)'
analytics_celeryworker.3.kjxznwxx8afz@VM_0_11_centos    | [2020-10-01 03:50:06,257: ERROR/MainProcess] Process 'ForkPoolWorker-81823' pid:5132 exited with 'signal 9 (SIGKILL)'

Desktop (please complete the following information):

Additional context

Add any other context about the problem here.

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.92. Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

davidteather commented 4 years ago

Probably related to #208 at least a little bit.

There's not much memory management right now for this API, but the 1GB seems excessive.

I wouldn't recommend getSuggestedUsersbyIDCrawler for large crawling projects. It only returns a few different users who are "trending" at the time. A better solution is probably to extract userIDs from trending, hashtags, and sound methods.

yswtrue commented 4 years ago

OK, and can I get users in specifict counntry, not in all countries? I have tried to pass the language param, but not sure is works.

davidteather commented 4 years ago

Theoretically, you can with the language and region parameters, but TikTok doesn't seem to care about those parameters. Your best bet is using a proxy to a country you want.

davidteather commented 4 years ago

I've made substantial changes to this API in terms of memory optimizations in the last few days your code may need updating, but it is more flexible to be able to deploy in a production environment.