Quan666 / ELF_RSS

QQ机器人 RSS订阅 插件,订阅源建议选择 RSSHub
https://myelf.club/archives/221
GNU General Public License v3.0
549 stars 55 forks source link

抓取Discuz! X3.4板块出现错误 #376

Closed liuzj288 closed 1 year ago

liuzj288 commented 1 year ago

ELF_RSS 、 go-cqhttp 、 nonebot 、 Python 版本及操作系统

nonebot 2.0.0b5 python 3.10.1 win10 ltsc ELF-RSS v2.6.12

列出安装的 Python 包

No response

如何复现

@bot add 悠闲数学娱乐论坛 http://kuing.infinityfreeapp.com/forum.php?mod=rss&fid=5&auth=0

期望行为

No response

实际行为

10-24 03:01:28 [INFO] ELF_RSS | 悠闲数学娱乐论坛 检查更新 10-24 03:01:29 [ERROR] apscheduler | Job "check_update (trigger: interval[0:05:00], next run at: 2022-10-24 03:06:36 CST)" raised an exception Traceback (most recent call last): File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\Scripts\nb.exe__main.py", line 7, in sys.exit(main()) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\click\core.py", line 1130, in call return self.main(*args, kwargs) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\click\core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\click\core.py", line 760, in invoke return callback(*args, *kwargs) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\nb_cli\commands\main.py", line 30, in run run_bot(file, app) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\nb_cli\handlers\deploy.py", line 25, in run_bot nonebot.run(app=f"{module_name}:{app}") File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\nonebot__init__.py", line 261, in run get_driver().run(args, *kwargs) File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\nonebot\drivers\fastapi.py", line 170, in run uvicorn.run( File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\uvicorn\main.py", line 576, in run server.run() File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\uvicorn\server.py", line 60, in run return asyncio.run(self.serve(sockets=sockets)) File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run return loop.run_until_complete(main) File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 633, in run_until_complete self.run_forever() File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\windows_events.py", line 321, in run_forever super().run_forever() File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 600, in run_forever self._run_once() File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 1896, in _run_once handle._run() File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run self._context.run(self._callback, self._args)

File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\apscheduler\executors\base_py3.py", line 30, in run_coroutine_job retval = await job.func(*job.args, *job.kwargs) File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\my_trigger.py", line 20, in check_update await asyncio.wait_for(rss_parsing.start(rss), timeout=wait_for) File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\tasks.py", line 445, in wait_for return fut.result() File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\rss_parsing.py", line 95, in start await pr.start(rss_name=rss.name, new_rss=new_rss) File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\parsing\parsing_rss.py", line 158, in start rss_title = new_rss["feed"]["title"] File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\feedparser\util.py", line 113, in getitem return dict.getitem(self, key) KeyError: 'title' 2022-10-24 03:01:29 ERROR Job "check_update (trigger: interval[0:05:00], next run at: 2022-10-24 03:06:36 CST)" raised an exception Traceback (most recent call last): File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\apscheduler\executors\base_py3.py", line 30, in run_coroutine_job retval = await job.func(job.args, **job.kwargs) File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\my_trigger.py", line 20, in check_update await asyncio.wait_for(rss_parsing.start(rss), timeout=wait_for) File "C:\Users\Alex\AppData\Local\Programs\Python\Python310\lib\asyncio\tasks.py", line 445, in wait_for return fut.result() File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\rss_parsing.py", line 95, in start await pr.start(rss_name=rss.name, new_rss=new_rss) File "C:\Users\Alex\Documents\LittlePaimon.\ELF_RSS\src\plugins\ELF_RSS2\parsing\parsing_rss.py", line 158, in start rss_title = new_rss["feed"]["title"] File "C:\Users\Alex\AppData\Local\pypoetry\Cache\virtualenvs\littlepaimon-Sqv748AN-py3.10\lib\site-packages\feedparser\util.py", line 113, in getitem return dict.getitem(self, key) KeyError: 'title'

Quan666 commented 1 year ago

ELF_RSS 版本

NekoAria commented 1 year ago

大概率无解,不知道是因为 Discuz! 故意这么设计的,还是网站配置成这样的:

访问这个所谓的 RSS 订阅地址,居然第一个返回的不是 XML ,而是带 script 的 HTML ,还要通过这步获取 cookie 然后跳转。

而且具体获取 cookie 的过程,我也不知道是怎么实现的,并不只是那个 script 里的,还有服务器后台赋予的。

另外我也试过用 inoreader 订阅这个地址,同样不能读取到。

解决方法估计有两个,但似乎都不可行:

  1. 模拟浏览器,通过 RSSHub 来生成另一个订阅地址。但我试了下,不知道是同样需要登录后的 cookie ,还是没适配这个 Discuz! 版本,并不能用。

  2. 通过浏览器拿到 cookie ,用 change 指令给这个订阅加上登录后的 cookie 。如果是非登录状态的 cookie ,似乎并不能用。我没有这个论坛的账号,也没法测试登录后的 cookie 。

liuzj288 commented 1 year ago

ELF_RSS 版本

ELF-RSS v2.6.12

NekoAria commented 1 year ago

写了个 demo 试了下,应该是设置 cookie 的逻辑不对:

import httpx
import feedparser

HEADERS = {
    "Accept": "application/xhtml+xml,application/xml,*/*",
    "Accept-Language": "en-US,en;q=0.9",
    "Cache-Control": "max-age=0",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36",
    "Connection": "keep-alive",
    "Content-Type": "application/xml; charset=utf-8",
}
headers = HEADERS.copy()
cookie = "_test=e17a9e35685651b88308f957e6bd2715; zsbn_2132_saltkey=fHV2ttin; zsbn_2132_lastvisit=1666551060; zsbn_2132_sid=R9yrY9; zsbn_2132_lastact=1666614769%09forum.php%09rss; zsbn_2132_st_t=0%7C1666559192%7C72b7ae796319e6a01d2f623a2be76187; zsbn_2132_forum_lastvisit=D_5_1666559192; zsbn_2132_visitedfid=5"
headers["Cookie"] = cookie
with httpx.Client(headers=headers) as session:
    url = "http://kuing.infinityfreeapp.com/forum.php?mod=rss&fid=5&auth=0"
    resp = session.get(url)
    d = feedparser.parse(resp.content)
    print(d["feed"]["title"])  # 悠闲数学娱乐论坛(第3版) - 初等数学讨论

要改 set_cookies 的逻辑和 fetch_rss 的逻辑。

liuzj288 commented 1 year ago

demo 试了下,应该是设置 cookie 的逻辑不对:

应该修改哪个文件呢