Kyomotoi / ATRI

A project for ATRI, use go-cqhttp and Nonebot2.
https://atri.imki.moe
GNU General Public License v3.0
724 stars 87 forks source link

setu.py模块的反爬虫问题与疑似解决方法 #3

Closed Xiaodx912 closed 4 years ago

Xiaodx912 commented 4 years ago

在尝试使用setu.py模块时,遇到了无回复的问题。经检查Mirai的运行log,有形如 18:00:22 [ERROR] [CQHTTPMirai] java.io.IOException: Server returned HTTP response code: 403 for URL: https://XXX.jpg at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900) ... 的报错。对比此issue中提到的现象,初步断定为该pixiv反代网站的反爬虫机制返回了403。

经过测试,对例如url = 'https://i.pixiv.cat/img-original/img/2017/08/20/21/44/35/64531410_p0.jpg'的api返回url,替换为new_url = 'https://pixiv.cat/'+url.split('/')[-1].split('_')[0]+'.jpg'可暂时性解决此问题。但本方法未补全图片分页信息,且随时有被反爬虫机制再次屏蔽的可能性。

Kyomotoi commented 4 years ago

请问,api申请了吗 这个接口不到次数限制是不会出现无法 调用/反爬 的 api详细:https://api.lolicon.app/#/setu 目前我的计划是: 在机器人目录下附加一个API,脱离对其他涩图api依赖 涩图通过机器人进行上传,使用方法会在重写的部署说明中说明清楚 由于机器人现在还在构建中难免会碰到类似这类的bug,请谅解

Xiaodx912 commented 4 years ago

我申请了key,api本身并无问题。 但api返回的形如"url":"https:\/\/i.pixiv.cat\/img-original\/img\/2018\/11\/02\/17\/30\/00\/71469705_p0.png"的链接应该是有反爬的。

 09:37:27 [DEBUG] [CQHTTPMirai] {"action": "send_msg", "params": {"user_id": <QQID>, "message_type": "private", "message": "Title: \u011f\u0178\u0152\u00b8\nPid: 77949068\n[CQ:image,file=https://i.pixiv.cat/img-original/img/2019/11/23/14/13/05/77949068_p0.png]\n---------------\n\u5b8c\u6210\u65f6\u95f4:0.42s"}, "echo": {"seq": 1}}
 09:37:28 [ERROR] [CQHTTPMirai] java.io.IOException: Server returned HTTP response code: 403 for URL: https://i.pixiv.cat/img-original/img/2019/11/23/14/13/05/77949068_p0.png
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
        at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:268)
        at net.mamoe.mirai.utils.FileCacheStrategy$TempCache.newImageCache(FileCacheStrategy.jvm.kt:183)
        at net.mamoe.mirai.utils.FileCacheStrategy$PlatformDefault.newImageCache(FileCacheStrategy.jvm.kt)
        at net.mamoe.mirai.utils.internal.DeferredReusableInput$init$3.invokeSuspend(DeferredReusableInput.jvm.kt:26)
        at net.mamoe.mirai.utils.internal.DeferredReusableInput$init$3.invoke(DeferredReusableInput.jvm.kt)
        at kotlinx.coroutines.intrinsics.UndispatchedKt.startUndispatchedOrReturn(Undispatched.kt:91)
        at kotlinx.coroutines.BuildersKt__Builders_commonKt.withContext(Builders.common.kt:160)
        at kotlinx.coroutines.BuildersKt.withContext(Unknown Source)
        at net.mamoe.mirai.utils.internal.DeferredReusableInput.init(DeferredReusableInput.jvm.kt:18)
        at net.mamoe.mirai.qqandroid.contact.FriendImpl.uploadImage(FriendImpl.kt:104)
        at net.mamoe.mirai.utils.ExternalImageKt.upload(ExternalImage.kt:124)
        at net.mamoe.mirai.message.SendImageUtilsJvmKt.uploadAsImage(SendImageUtilsJvm.kt:95)
        at tech.mihoyo.mirai.util.CQMessgeParserKt$convertToMiraiMessage$$inlined$with$lambda$2.invokeSuspend(CQMessgeParser.kt:314)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:56)
        at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:571)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:738)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:678)
        at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:665)

 09:37:28 [DEBUG] [CQHTTPMirai] {"status":"failed","retcode":103,"data":null,"echo":{"seq":1}}

参考以上miraiOK log可知setu.py成功返回了正确的CQ码,但在发送中出现问题。按照我的初步修复修改后,成功发送的log如下。

 09:46:13 [DEBUG] [CQHTTPMirai] {"action": "send_msg", "params": {"user_id": <QQID>, "message_type": "private", "message": "Title: \u30ab\u30fc\u30de\nPid: 74279762\n[CQ:image,file=https://pixiv.cat/74279762.jpg]\n---------------\n\u5b8c\u6210\u65f6\u95f4:0.434s"}, "echo": {"seq": 4}}
 09:46:15 [INFO] [NETWORK] Send: LongConn.OffPicUp
 09:46:16 [INFO] [NETWORK] Recv: FileExists(resourceId=/18446744072664718378-4024723735-5F9DEB060E11B32A8E2C9190D3494D7F, imageInfo=net.mamoe.mirai.qqandroid.network.protocol.data.proto.Cmd0x352$ImgInfo@5efb855a)
 09:46:16 [INFO] [NETWORK] Send: MessageSvc.PbSendMsg
 09:46:17 [INFO] [NETWORK] Recv: MessageSvcPbSendMsg.Response.SUCCESS
 09:46:17 [INFO] [BOT <BOTID>] Friend(<QQID>) <- Title: カーマ\nPid: 74279762\n[mirai:image:/18446744072664718378-4024723735-5F9DEB060E11B32A8E2C9190D3494D7F]\n---------------\n完成时间:0.434s
 09:46:17 [DEBUG] [CQHTTPMirai] {"status":"ok","retcode":0,"data":{"type":"MessageData","message_id":22917},"echo":{"seq":4}}

目前我加入了

    pg_n = int(dc["data"][0]["url"].split('/')[-1].split('_p')[1].split('.')[0])
        if pg_n != 0:
            pg_n = '-'+str(pg_n+1)
        else:
            pg_n = ''

来解决目标url页数信息缺失的问题。但当api返回图片url是多页图集中的第一页时仍会发送失败。

此外,当运行环境是linux而非windows时,由于系统文件路径格式不同,各插件内硬编码的文件路径均会失效。目前已知的失效文件有switch.jsonsqlite.db

总之感谢你为ATRI项目的付出。

Kyomotoi commented 4 years ago

感谢支持!!! 关于路径这问题,预计今晚更新,目前正在测试,如果对此项目有修改代码的意愿话,可以fork修改后再提交 再次感谢你的支持 让我拥有了将屎山继续写下去的动力!