Evil0ctal / Douyin_TikTok_Download_API

🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音、快手、TikTok、Bilibili数据爬取工具,支持API调用,在线批量解析及下载。
https://douyin.wtf
Apache License 2.0
8.82k stars 1.38k forks source link

[BUG] 使用hybrid_parsing方法从抖音抓取视频元数据时出错 #434

Closed iDataist closed 3 months ago

iDataist commented 3 months ago

发生错误的平台: 抖音

发生错误的端点: 我没有使用端点。我直接使用了 douyin-tiktok-scraper 1.2.9

提交的输入值: 视频链接:https://www.douyin.com/video/7105012432434777356

你是否再次尝试过? 是的,多次尝试后错误仍然存在。

你是否查看了这个项目的自述文件或接口文档? 是的,并且可以确定问题是由程序引起的。

描述: 我正在使用 douyin_tiktok_scraper 库来抓取视频元数据,例如标题、观看次数、点赞次数等,使用提供的URL。我在一台装有Python 3.10.11的M1 Mac上运行此程序。下面是代码片段和遇到的错误信息。

代码:

import asyncio
from douyin_tiktok_scraper.scraper import Scraper

api = Scraper()
url = 'https://www.douyin.com/video/7105012432434777356'

async def hybrid_parsing(url: str) -> dict:
    # Hybrid parsing(Douyin/TikTok URL)
    result = await api.hybrid_parsing(url)
    print(f"The hybrid parsing result:\n {result}")
    return result

asyncio.run(hybrid_parsing(url=url))

报错信息:

正在解析**douyin**视频链接...
该链接为原始链接,无需转换,原始链接为: https://www.douyin.com/video/7105012432434777356
获取到的**douyin**视频ID是7105012432434777356
Traceback (most recent call last):
  File "/Users/***/.douyin/lib/python3.10/site-packages/douyin_tiktok_scraper/scraper.py", line 280, in get_douyin_video_data
    api_url = self.generate_x_bogus_url(api_url)
  File "/Users/***/.douyin/lib/python3.10/site-packages/douyin_tiktok_scraper/scraper.py", line 226, in generate_x_bogus_url
    xbogus = execjs.compile(open(self.relpath('./X-Bogus.js')).read()).call('sign', query,
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_abstract_runtime_context.py", line 37, in call
    return self._call(name, *args)
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_external_runtime.py", line 92, in _call
    return self._eval("{identifier}.apply(this, {args})".format(identifier=identifier, args=args))
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_external_runtime.py", line 78, in _eval
    return self.exec_(code)
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_abstract_runtime_context.py", line 18, in exec_
    return self._exec_(source)
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_external_runtime.py", line 85, in _exec_
    output = self._exec_with_tempfile(source)
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_external_runtime.py", line 127, in _exec_with_tempfile
    self._fail_on_non_zero_status(ret, stdoutdata, stderrdata)
  File "/Users/***/.douyin/lib/python3.10/site-packages/execjs/_external_runtime.py", line 134, in _fail_on_non_zero_status
    raise ProcessExitedWithNonZeroStatus(status=status, stdout=stdoutdata, stderr=stderrdata)
execjs._exceptions.ProcessExitedWithNonZeroStatus: (101, '', 'Warning: The jjs tool is planned to be removed from a future JDK release\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:483:16 Expected ; but found a\n            let a = !1;\n                ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:524:8 Expected an operand but found )\n    }, () => 0, () => "03v", {\n        ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:526:5 Expected an operand but found ,\n    }, {\n     ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:528:22 Expected ; but found :\n        msNewTokenList: [],\n                      ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:530:17 Expected ; but found :\n        clickList: [],\n                 ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:532:19 Expected ; but found :\n        activeState: [],\n                   ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:534:15 Expected ; but found :\n        envcode: 0,\n               ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:536:16 Expected ; but found :\n        msStatus: 0,\n                ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:538:13 Expected ; but found :\n        ttwid: "",\n             ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:540:19 Expected ; but found :\n        tt_webid_v2: ""\n                   ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:543:14 Expected , but found =>\n    }, (e, b) => {\n              ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:544:12 Expected ; but found a\n        let a = new Uint8Array(3);\n            ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:545:8 Invalid return statement\n        return a[0] = e / 256, a[1] = e % 256, a[2] = b % 256, String.fromCharCode.apply(null, a)\n        ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:546:4 Expected eof but found }\n    }, (e, b) => {\n    ^\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/***/.douyin/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 114, in __call__
    result = await fn(*args, **kwargs)
  File "/Users/***/.douyin/lib/python3.10/site-packages/douyin_tiktok_scraper/scraper.py", line 293, in get_douyin_video_data
    raise ValueError(f"获取抖音视频数据出错了: {e}")
ValueError: 获取抖音视频数据出错了: (101, '', 'Warning: The jjs tool is planned to be removed from a future JDK release\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:483:16 Expected ; but found a\n            let a = !1;\n                ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:524:8 Expected an operand but found )\n    }, () => 0, () => "03v",

 {\n        ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:526:5 Expected an operand but found ,\n    }, {\n     ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:528:22 Expected ; but found :\n        msNewTokenList: [],\n                      ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:530:17 Expected ; but found :\n        clickList: [],\n                 ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:532:19 Expected ; but found :\n        activeState: [],\n                   ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:534:15 Expected ; but found :\n        envcode: 0,\n               ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:536:16 Expected ; but found :\n        msStatus: 0,\n                ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:538:13 Expected ; but found :\n        ttwid: "",\n             ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:540:19 Expected ; but found :\n        tt_webid_v2: ""\n                   ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:543:14 Expected , but found =>\n    }, (e, b) => {\n              ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:544:12 Expected ; but found a\n        let a = new Uint8Array(3);\n            ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:545:8 Invalid return statement\n        return a[0] = e / 256, a[1] = e % 256, a[2] = b % 256, String.fromCharCode.apply(null, a)\n        ^\n/var/folders/1l/80bgl5gj0z30lxdxs2zrv44r0000gn/T/execjsjiugj65j.js:546:4 Expected eof but found }\n    }, (e, b) => {\n    ^\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/***/Downloads/digital_human/douyin_tiktok.py", line 12, in <module>
    asyncio.run(hybrid_parsing(url=url))
  File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/Cellar/python@3.10/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/Users/***/Downloads/digital_human/douyin_tiktok.py", line 8, in hybrid_parsing
    result = await api.hybrid_parsing(url)
  File "/Users/***/.douyin/lib/python3.10/site-packages/douyin_tiktok_scraper/scraper.py", line 467, in hybrid_parsing
    data = await self.get_douyin_video_data(video_id) if url_platform == 'douyin' \
  File "/Users/***/.douyin/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 185, in async_wrapped
    return await fn(*args, **kwargs)
  File "/Users/***/.douyin/lib/python3.10/site-packages/tenacity/asyncio/__init__.py", line 111, in __call__
    do = await self.iter(retry_state=retry_state)
  File "/Users/***/.douyin/lib/python3.10/site-packages/tenacity/_utils.py", line 99, in inner
    return call(*args, **kwargs)
  File "/Users/***/.douyin/lib/python3.10/site-packages/tenacity/__init__.py", line 413, in exc_check
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x10d5a9fc0 state=finished raised ValueError>]

重现步骤:

  1. 安装 douyin_tiktok_scraper 库。
  2. 使用提供的代码片段。
  3. 在装有Python 3.10.11的M1 Mac上运行脚本。

预期行为: 脚本应在没有错误的情况下返回提供的视频URL的元数据。

实际行为: 脚本引发了一个 ProcessExitedWithNonZeroStatus 和一个 ValueError,表明在 execjs 编译和随后在 get_douyin_video_data 方法中的处理存在问题。

附加信息: 错误消息提到 execjs 生成的临时JavaScript文件中有多个语法错误,例如“预期是 ; 但发现了 a”,“预期是操作数但发现了 )”,以及“无效的 return 语句”。

请调查这个问题并提供解决方案。谢谢!

Evil0ctal commented 3 months ago

这个pypi包已经没有在维护了,你可以clone本项目,然后根据自述文档中的内容修改config.yaml文件,然后import对应的爬虫到你的项目即可。

iDataist commented 3 months ago

多谢,@Evil0ctal。请问如何自动的设置config.yaml?还是说要去开发者工具中找?

Evil0ctal commented 3 months ago

多谢,@Evil0ctal。请问如何自动的设置config.yaml?还是说要去开发者工具中找?

看看自述文档,有一个视频,目前只能手动设置,无法自动化设置,如果你需要大规模进行数据爬取,也可以考虑使用我们的商业版本。

iDataist commented 3 months ago

感谢回答,@Evil0ctal, 视频讲解的很清楚。